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DIAGNOSIS AND PROGNOSIS OF BREAST CANCER PATIENTS 

5 

This application claims benefit of United States Provisional Application No. 
60/298,918, filed June 18, 2001, and United States Provisional Application No. 60/380,710, 
filed on May 14, 2002, each of which is incorporated by reference herein in its entirety. 

This application includes a Sequence Listing submitted on compact disc, 
10 recorded on two compact discs, including one duplicate, containing Filename 

9301 175228.txt, of size 6,755,971 bytes, created June 13, 2002. The sequence listing on the 
compact discs is incorporated by reference herein in its entirety. 

1. FIELD OF THE INVENTION 

1 5 ' The present invention relates to the identification of marker genes useful in 

the diagnosis and prognosis of breast cancer. More particularly, the invention relates to the 
identification of a set of marker genes associated with breast cancer, a set of marker genes 
differentially expressed in estrogen receptor (+) versus estrogen receptor (-) tumors, a set of 
marker genes differentially expressed in BRCA1 versus sporadic tumors, and a set of marker 

20 genes differentially expressed in sporadic tumors from patients with good clinical prognosis 
metastasis- or disease-free >5 years) versus patients with poor clinical prognosis (i.e., 
metastasis- or disease-free <5 years). For each of the marker sets above, the invention 
further relates to methods of distinguishing the breast cancer-related conditions. The 
invention further provides methods for determining the course of treatment of a patient with 

25 breast cancer. 

2. BACKGROUND OF THE INVENTION 
The increased number of cancer cases reported in the United States, and, 
indeed, around the world, is a major concern. Currently there are only a handful of 
30 treatments available for specific types of cancer, and these provide no guarantee of success. 
In order to be most effective, these treatments require not only an early detection of the 
malignancy, but a reliable assessment of the severity of the malignancy. 

The incidence of breast cancer, a leading cause of death in women, has been 
gradually increasing in the United States over the last thirty years. Its cumulative risk is 
35 relatively high; 1 in 8 women are expected to develop some type of breast cancer by age 85 
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in the United States. In fact, breast cancer is the most common cancer in women and the 
second most common cause of cancer death in the United States. In 1997, it was estimated 
that 181,000 new cases were reported in the U.S., and that 44,000 people would die of 
breast cancer (Parker et al, CA Cancer J. Clin. 47:5-27 (1997); Chu et al 9 J. Nat Cancer 

5 Inst. 88: 1571-1579 (1996)). While mechanism of tumorigenesis for most breast carcinomas 
is largely unknown, there are genetic factors that can predispose some women to developing 
breast cancer (Miki et al, Science, 266:66-71(1994)). The discovery and characterization of 
BRCA1 and BRCA2 has recently expanded our knowledge of genetic factors which can 
contribute to familial breast cancer. Germ-line mutations within these two loci are 

10 associated with a 50 to 85% lifetime risk of breast and/or ovarian cancer (Casey, Curr. 
Opin. Oncol. 9:88-93 (1997); Marcus et al, Cancer 77:697-709 (1996)). Only about 5% to 
10% of breast cancers are associated with breast cancer susceptibility genes, BRCA1 and 
BRCA2. The cumulative lifetime risk of breast cancer for women who carry the mutant 
BRCA1 is predicted to be approximately 92%, while the cumulative lifetime risk for the 

15 non-carrier majority is estimated to be approximately 10%. BRCA1 is a tumor suppressor 
gene that is involved in DNA repair anc cell cycle control, which are both important for the 
maintenance of genomic stability. More than 90% of all mutations reported so far result in 
a premature truncation of the protein product with abnormal or abolished function. The 
histology of breast cancer in BRCA1 mutation carriers differs from that in sporadic cases, 

20 but mutation analysis is the only way to find the carrier. Like BRCA1, BRCA2 is involved 
in the development of breast cancer, and like BRCA1 plays a role in DNA repair. However, 
unlike BRCA1, it is not involved in ovarian cancer. 

Other genes have been linked to breast cancer, for example c-erb-2 (HER2) 
and p53 (Beenken et al., Ann. Surg. 233(5):630-638 (2001). Overexpression of c-erb-2 

25 (HER2) and p53 have been correlated with poor prognosis (Rudolph et al., Hum. Pathol. 
32(3) :3 11-319 (2001), as has been aberrant expression products of mdm2 (Lukas et al., 
Cancer Res. 61(7):3212-3219 (2001) and cyclinl and p27 (Porter & Roberts, International 
Publication WO98/33450, published August 6, 1998). However, no other clinically useful 
markers consistently associated with breast cancer have been identified. 

30 Sporadic tumors, those not currently associated with a known germline 

mutation, constitute the majority of breast cancers. It is also likely that other, non-genetic 
factors also have a significant effect on the etiology of the disease. Regardless of the 
cancer's origin, breast cancer morbidity and mortality increases significantly if it is not 
detected early in its progression. Thus, considerable effort has focused on the early detection 

35 of cellular transformation and tumor formation in breast tissue. 
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A marker-based approach to tumor identification and characterization 
promises improved diagnostic and prognostic reliability. Typically, the diagnosis of breast 
cancer requires histopathological proof of the presence of the tumor. In addition to 
diagnosis, histopathological examinations also provide information about prognosis and 
5 selection of treatment regimens. Prognosis may also be established based upon clinical 
parameters such as tumor size, tumor grade, the age of the patient, and lymph node 
metastasis. 

Diagnosis and/or prognosis may be determined to varying degrees of 
effectiveness by direct examination of the outside of the breast, or through mammography 

1 0 or other X-ray imaging methods (Jatoi, Am, J. Surg, 177:5 1 8-524 (1999)). The latter 

approach is not without considerable cost, however. Every time a mammogram is taken, the 
patient incurs a small risk of having a breast tumor induced by the ionizing properties of the 
radiation used during the test. In addition, the process is expensive and the subjective 
interpretations of a technician can lead to imprecision. For example, one study showed 

1 5 major clinical disagreements for about one-third of a set of mammograms that were 

interpreted individually by a surveyed group of radiologists. Moreover, many women find 
that undergoing a mammogram is a painful experience. Accordingly, the National Cancer 
Institute has not recommended mammograms for women under fifty years of age, since this 
group is not as likely to develop breast cancers as are older women. It is compelling to note, 

20 however, that while only about 22% of breast cancers occur in women under fifty, data 
suggests that breast cancer is more aggressive in pre-menopausal women. 

In clinical practice, accurate diagnosis of various subtypes of breast cancer is 
important because treatment options, prognosis, and the likelihood of therapeutic response 
all vary broadly depending on the diagnosis. Accurate prognosis, or determination of 

25 distant metastasis-free survival could allow the oncologist to tailor the administration of 
adjuvant chemotherapy, with women having poorer prognoses being given the most 
aggressive treatment. Furthermore, accurate prediction of poor prognosis would greatly 
impact clinical trials for new breast cancer therapies, because potential study patients could 
then be stratified according to prognosis. Trials could then be limited to patients having 

30 poor prognosis, in turn making it easier to discern if an experimental therapy is efficacious. 

To date, no set of satisfactory predictors for prognosis based on the clinical 
information alone has been identified. The detection oiBRCAl or BRCA2 mutations 
represents a step towards the design of therapies to better control and prevent the 
appearance of these tumors. However, there is no equivalent means for the diagnosis of 

35 
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patients with sporadic tumors, the most common type of breast cancer tumor, nor is there a 
means of differentiating subtypes of breast cancer. 

3. SUMMARY OF THE INVENTION 

5 The invention provides gene marker sets that distinguish various types and 

subtypes of breast cancer, and methods of use therefor. In one embodiment, the invention 
provides a method for classifying a cell sample as ER(+) or ER(-) comprising detecting a 
difference in the expression of a first plurality of genes relative to a control, said first 
plurality of genes consisting of at least 5 of the genes corresponding to the markers listed in 

10 Table 1. In specific embodiments, said plurality of genes consists of at least 50, 100, 200, 
500, 1000, up to 2,460 of the gene markers listed in Table 1. In another specific 
embodiment, said plurality of genes consists of each of the genes corresponding to the 2,460 
markers listed in Table 2. In another specific embodiment, said plurality consists of the 550 
markers listed in Table 2. In another specific embodiment, said control comprises nucleic 

1 5 acids derived from a pool of tumors from individual sporadic patients. La another specific 
embodiment, said detecting comprises the steps of: (a) generating an ER(-f) template by 
hybridization of nucleic acids derived from a plurality of ER(+) patients within a plurality of 
sporadic patients against nucleic acids derived from a pool of tumors from individual 
sporadic patients; (b) generating an ER(-) template by hybridization of nucleic acids derived 

20 from a plurality of ER(-) patients within said plurality of sporadic patients against nucleic 
acids derived from said pool of tumors from individual sporadic patients within said 
plurality, (c) hybridizing nucleic acids derived from an individual sample against said pool; 
and (d) determining the similarity of marker gene expression in the individual sample to the 
ER(+) template and the ER(-) template, wherein if said expression is more similar to the 

25 ER(+) template, the sample is classified as ER(+), and if said expression is more similar to 
the ER(-) template, the sample is classified as ER(-). 

The invention further provides the above methods, applied to the 
classification of samples as BRCA1 or sporadic, and classifying patients as having good 
prognosis or poor prognosis. For the 5i?G4i/sporadic gene markers, the invention provides 

30 that the method may be used wherein the plurality of genes is at least 5, 20, 50, 100, 200 or 
300 of the i?i?C4i/sporadic markers listed in Table 3. In a specific embodiment, the 
optimum 100 markers listed in Table 4 are used. For the prognostic markers, the invention 
provides that at least 5, 20, 50, 100, or 200 gene markers listed in Table 5 may be used. In a 
specific embodiment, the optimum 70 markers listed in Table 6 are used. 

35 
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The invention further provides that markers may be combined. Thus, in one 
embodiment, at least 5 markers from Table 1 are used in conjunction with at least 5 markers 
from Table 3. In another embodiment, at least 5 markers from Table 5 are used in 
conjunction with at least 5 markers from Table 3. In another embodiment, at least 5 
5 markers from Table 1 are used in conjunction with at least 5 markers from Table 5. In 
another embodiment, at least 5 markers from each of Tables 1, 3, and 5 are used 
simultaneously. 

The invention further provides a method for classifying a sample as ER(+) or 
ER(-) by calculating the similarity between the expression of at least 5 of the markers listed 

10 in Table 1 in the sample to the expression of the same markers in an ER(-) nucleic acid pool 
and an ER(+) nucleic acid pool, comprising the steps of: (a) labeling nucleic acids derived 
from a sample, with a first fluorophore to obtain a first pool of fluorophore-labeled nucleic 
acids; (b) labeling with a second fluorophore a first pool of nucleic acids derived from two 
or more ER(+) samples, and a second pool of nucleic acids derived from two or more ER(-) 

15 samples; (c) contacting said first fluorophore-labeled nucleic acid and said first pool of 
second fluorophore-labeled nucleic acid with said first microarray under conditions such- 
that hybridization can occur, and contacting said first fluorophore-labeled nucleic acid and 
said second pool of second fluorophore-labeled nucleic acid with said second microarray 
under conditions such that hybridization can occur, detecting at each of a plurality of 

20 discrete loci on the first microarray a first flourescent emission signal from said first 
fluorophore-labeled nucleic acid and a second fluorescent emission signal from said first 
pool of second fluorophore-labeled genetic matter that is bound to said first microarray 
under said conditions, and detecting at each of the marker loci on said second microarray 
said first fluorescent emission signal from said first fluorophore-labeled nucleic acid and a 

25 third fluorescent emission signal from said second pool of second fluorophore-labeled 
nucleic acid; (d) determining the similarity of the sample to the ER(-) and ER(+) pools by 
comparing said first fluorescence emission signals and said second fluorescence emission 
signals, and said first emission signals and said third fluorescence emission signals; and (e) 
classifying the sample as ER(+) where the first fluorescence emission signals are more 

30 similar to said second fluorescence emission signals than to said third fluorescent emission 
signals, and classifying the sample as ER(-) where the first fluorescence emission signals 
are more similar to said third fluorescence emission signals than to said second fluorescent 
emission signals, wherein said similarity is defined by a statistical method. The invention 
further provides that the other disclosed marker sets may be used in the above method to 
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distinguish BRCA1 from sporadic tumors, and patients with poor prognosis from patients 
with good prognosis. 

In a specific embodiment, said similarity is calculated by determining a first 
sum of the differences of expression levels for each marker between said first fluorophore- 

5 labeled nucleic acid and said first pool of second fluorophore-labeled nucleic acid, and a 
second sum of the differences of expression levels for each marker between said first 
fluorophore-labeled nucleic acid and said second pool of second fluorophore-labeled 
nucleic acid, wherein if said first sum is greater than said second sum, the sample is 
classified as ER(-), and if said second sum is greater than said first sum, the sample is 

10 classified as ER(+). In another specific embodiment, said similarity is calculated by 

computing a first classifier parameter Pj between an ER(+) template and the expression of 
said markers in said sample, and a second classifier parameter P 2 between an ER(-) template 
and the expression of said markers in said sample, wherein said P x and P 2 are calculated 
according to the formula: 

15 

Pi =fe •yWil W Equation (1) 

wherein zj and z 2 are ER(-) and ER(+) templates, respectively, and are calculated by * 

averaging said second fluorescence emission signal for each of said markers in said first 
20 pool of second fluorophore-labeled nucleic acid and said third fluorescence emission signal 
for each of said markers in said second pool of second fluorophore-labeled nucleic acid, 

respectively, and wherein y is said first fluorescence emission signal of each of said 

markers in the sample to be classified as ER(+) or ER(-), wherein the expression of the 
markers in the sample is similar to ER(+) if P x < P 2 , and similar to ER(-) if Pj > P 2 . 

The invention further provides a method for identifying marker genes the 
expression of which is associated with a particular phenotype. In one embodiment, the 
invention provides a method for detennining a set of marker genes whose expression is 
associated with a particular phenotype, comprising the steps of: (a) selecting the phenotype 
having two or more phenotype categories; (b) identifying a plurality of genes wherein the 

30 

expression of said genes is correlated or anticorrelated with one of the phenotype categories, 
and wherein the correlation coefficient for each gene is calculated according to the equation 

p=(c*r)/(|3 Equation (2) 

2 5 wherein c is a number representing said phenotype category and r is the logarithmic 
expression ratio across all the samples for each individual gene, wherein if the correlation 
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coefficient has an absolute value of a threshold value or greater, said expression of said 
gene is associated with the phenotype category, and wherein said plurality of genes is a set 
of marker genes whose expression is associated with a particular phenotype. The threshold 
depends upon the number of samples used; the threshold can be calculated as 3 X , 

5 where UyfUPi is ^ e distribution width and n = the number of samples. In a specific 

embodiment where n = 98, said thresholdi value is 0.3. In a specific embodiment, said set of 
marker genes is validated by: (a) using a statistical method to randomize the association 
between said marker genes and said phenotype category, thereby creating a control 
correlation coefficient for each marker gene; (b) repeating step (a) one hundred or more 

10 times to develop a frequency distribution of said control correlation coefficients for each 
marker gene; (c) detennining the number of marker genes having a control correlation 
coefficient of a threshold value or above, thereby creating a control marker gene set; and (d) 
comparing the number of control marker genes so identified to the number of marker 
genes, wherein if the p value of the difference between the number of marker genes and the 

15 number of control genes is less than 0.01, said set of marker genes is validated. In another 
specific embodiment, said set of marker genes is optimized by the method comprising: (a) 
rank-ordering the genes by amplitude of correlation or by significance of the correlation 
coefficients, and (b) selecting an arbitrary number of marker genes from the top of the rank- 
ordered list. The threshold value depends upon the number of samples tested. 

20 The invention further provides a method for assigning a person to one of a 

plurality of categories in a clinical trial, comprising detennining for each said person the 
level of expression of at least five of the prognosis markers listed in Table 6, detennining 
therefrom whether the person has an expression pattern that correlates with a good 
prognosis or a poor prognosis, and assigning said person to one category in a clinical trial if 

25 said person is determined to have a good prognosis, and a different category if that person is 
determined to have a poor prognosis. The invention further provides a method for assigning 
a person to one of a plurality of categories in a clinical trial, where each of said categories is 
associated with a different phenotype, comprising determining for each said person the level 
of expression of at least five markers from a set of markers, wherein said set of markers 

30 includes markers associated with each of said clinical categories, determining therefrom 
whether the person has an expression pattern that correlates with one of the clinical 
categories, an assigning said person to one of said categories if said person is determined to 
have a phenotype associated with that category. 

The invention further provides a method of classifying a first cell or 

35 organism as having one of at least two different phenotypes, said at least two different 
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phenotypes comprising a first phenotype and a second phenotype, said method comprising: 
(a) comparing the level of expression of each of a plurality of genes in a first sample from 
the first cell or organism to the level of expression pf each of said genes, respectively, in a 
pooled sample from a plurality of cells or organisms, said plurality of cells or organisms 

5 comprising different cells or organisms exhibiting said at least two different phenotypes, 
respectively, to produce a first compared value; (b) comparing said first compared value to a 
second compared value, wherein said second compared value is the product of a method 
comprising comparing the level of expression of each of said genes in a sample from a cell 
or organism characterized as having said first phenotype to the level of expression of each 

10 of said genes, respectively, in said pooled sample; (c) comparing said first compared value 
to a third compared value, wherein said third compared value is the product of a method 
comprising comparing the level of expression of each of said genes in a sample from a cell 
or organism characterized as having said second phenotype to the level of expression of 
each of said genes, respectively, in said pooled sample, (d) optionally carrying out one or 

1 5 more times a step of comparing said first compared value to one or more additional 
compared values, respectively, each additional compared value being the product of a 
method comprising comparing the level of expression of each of said genes in a sample 
from a cell or organism characterized as having a phenotype different from said first and 
second phenotypes but included among said at least two different phenotypes, to the level of 

20 expression of each of said genes, respectively, in said pooled sample; and (e) determining to 
which of said second, third and, if present, one or more additional compared values, said 
first compared value is most similar, wherein said first cell or organism is determined to 
have the phenotype of the cell or organism used to produce said compared value most 
similar to said first compared value. 

25 Li a specific embodiment of the above method, said compared values are 

each ratios of the levels of expression of each of said genes. In another specific 
embodiment, each of said levels of expression of each of said genes in said pooled sample 
are normalized prior to any of said comparing steps. In another specific embodiment, 
normalizing said levels of expression is carried out by dividing each of said levels of 

30 expression by the median or mean level of expression of each of said genes or dividing by 
the mean or median level of expression of one or more housekeeping genes in said pooled 
sample. In a more specific embodiment, said normalized levels of expression are subjected 
to a log transform and said comparing steps comprise subtracting said log transform from 
file log of said levels of expression of each of said genes in said sample from said cell or 

35 organism. In another specific embodiment, said at least two different phenotypes are 
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different stages of a disease or disorder. In another specific embodiment, said at least two 
different phenotypes are different prognoses of a disease or disorder. In yet another specific 
embodiment, said levels of expression of each of said genes, respectively, in said pooled 
sample or said levels of expression of each of said genes in a sample from said cell or 
5 organism characterized as having said first phenotype, said second phenotype, or said 
phenotype different from said first and second phenotypes, respectively, are stored on a 
computer. 

The invention further provides microarrays comprising the disclosed marker 
sets. In one embodiment, the invention provides a microarray comprising at least 5 markers 
10 derived from any one of Tables 1-6, wherein at least 50% of the probes on the microarray 
are present in any one of Tables 1-6. In mor6 specific embodiments, at least 60%, 70%, 
80%, 90%, 95% or 98% of the probes on said microarray are present in any one of Tables 1- 
6. 

In another embodiment, the invention provides a microarray for 

1 5 distinguishing ER(+) and ER(-) cell samples comprising a positionally-addressable array of 
polynucleotide probes bound to a support, said polynucleotide probes comprising a plurality 
of polynucleotide probes of different nucleotide sequences, each of said different nucleotide 
sequences comprising a sequence complementary and hybridizable to a plurality of genes, 
said plurality consisting of at least 5 of the genes corresponding to the markers listed in 

20 Table 1 or Table 2, wherein at least 50% of the probes on the microarray are present in any 
one of Table 1 or Table 2. In yet another embodiment, the invention provides a microarray 
for distinguishing BRCAl-type and sporadic tumor-type cell samples comprising a 
positionally-addressable array of polynucleotide probes bound to a support, said 
polynucleotide probes comprising a plurality of polynucleotide probes of different 

25 nucleotide sequences, each of said different nucleotide sequences comprising a sequence 
complementary and hybridizable to a plurality of genes, said plurality consisting of at least 5 
of the genes corresponding to the markers listed in Table 3 or Table 4, wherein at least 50% 
of the probes on the microarray are present in any one of Table 3 or Table 4. In still another 
embodiment, the invention provides a microarray for distinguishing cell samples from 

30 patients having a good prognosis and cell samples from patients having a poor prognosis 
comprising a positionally-addressable array of polynucleotide probes bound to a support, 
said polynucleotide probes comprising a plurality of polynucleotide probes of different 
nucleotide sequences, each of said different nucleotide sequences comprising a sequence 
complementary and hybridizable to a plurality of genes, said plurality consisting of at least 5 

35 of the genes corresponding to the markers listed in Table 5 or Table 6, wherein at least 50% 
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of the probes on the microarray are present in any one of Table 5 or Table 6. The invention 
further provides for microarrays comprising at least 5, 20, 50, 100, 200, 500, 100, 1,250, 
1,500, 1,750, or 2,000 of the ER-status marker genes listed in Table 1, at least 5, 20, 50, 
100, 200, or 300 of the BRCA1 sporadic marker genes listed in Table 3, or at least 5, 20, 50, 

5 100 or 200 of the prognostic marker genes listed in Table 5, in any combination, wherein at 
least 50%, 60%, 70% 80%, 90%, 95% or 98% of the probes on said microarrays are present 
in Table 1, Table 3 anu/or Table 5. 

The invention further provides a kit for determining the ER-status of a 
sample, comprising at least two microarrays each comprising at least 5 of the markers listed 

10 in Table 1, and a computer system for determining the similarity of the level of nucleic acid 
derived from the markers listed in Table 1 in a sample to that in an ER(-) pool and an ER(+) 
pool, the computer system comprising a processor, and a memory encoding one or more 
programs coupled to the processor, wherein the one or more programs cause the processor 
to perform a method comprising computing the aggregate differences in expression of each 

15 marker between the sample and ER(-) pool and the aggregate differences in expression of 
each marker between the sample and ER(+) pool, or a method comprising determining the 
correlation of expression of the markers in the sample to the expression in the ER(-) and 
ER(+) pools, said correlation calculated according to Equation (4). The invention provides 
for kits able to distinguish BRCA1 and sporadic tumors, and samples from patients with 

20 good prognosis from samples from patients with poor prognosis, by inclusion of the 

appropriate marker gene sets. The invention further provides a kit for determining whether 
a sample is derived from a patient having a good prognosis or a poor prognosis, comprising 
at least one microarray comprising probes to at least 5 of the genes corresponding to the 
markers listed in Table 5, and a computer readable medium having recorded thereon one or 

25 more programs for deterxnining the similarity of the level of nucleic acid derived from the 
markers listed in Table 5 in a sample to that in a pool of samples derived from individuals 
having a good prognosis and a pool of samples derived from individuals having a good 
prognosis, wherein the one or more programs cause a computer to perform a method 
comprising computing the aggregate differences in expression of each marker between the 

30 sample and the good prognosis pool and the aggregate differences in expression of each 
marker between the sample and the poor prognosis pool, or a method comprising 
determining the correlation of expression of the markers in the sample to the expression in 
the good prognosis and poor prognosis pools, said correlation calculated according to 
Equation (3). 

35 
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4. BRIEF DESCRIPTION OF THE FIGURES 
FIG. 1 is a Venn-type diagram showing the overlap between the marker sets 
disclosed herein, including the 2,460 ER markers, the 430 BRCA1 /sporadic markers, and the 
231 prognosis reporters. 
5 FIG. 2 shows the experimental procedures for measuring differential changes 

in mRNA transcript abundance in breast cancer tumors used in this study. In each 
experiment, Cy5-labeled cRNA from one tumor X is hybridized on a 25k human microarray 
together with a Cy3-labeled cRNA pool made of cRNA samples from tumors 1, 2, ... N. 
The digital expression data were obtained by scanning and image processing. The error 
10 modeling allowed us to assign a p-value to each transcript ratio measurement. 

FIG. 3 Two-dimensional clustering reveals two distinctive types of tumors. 
The clustering was based on the gene expression data of 98 breast cancer tumors over 4986 
significant genes. Dark gray (red) presents up-regulation, light gray (green) represents 
down-regulation, black indicates no change in expression, and gray indicates that data is not 
1 5 available. 4986 genes were selected that showed a more than two fold change in expression 
ratios in more than five experiments. Selected clinical data for test results of BR CA1 
mutations, estrogen receptor (ER), and proestrogen receptor (PR), tumor grade, lymphocytic 
infiltrate, and angioinvasion are shown at right. Black denotes negative and white denotes 
positive. The dominant pattern in the lower part consists of 36 patients, out of which 34 are 
20 ER-negative (total 39), and 16 are BR CAl-mutation carriers (total 18). 

FIG. 4 A portion of unsupervised clustered results as shown in FIG. 3. 
ESR1 (the estrogen receptor gene) is coregulated with a set of genes that are strongly co- 
regulated to form a dominant pattern. 

FIG. 5 A Histogram of correlation coefficients of significant genes between 
25 their expression ratios and estrogen-receptor (ER) status (i.e. 9 ER level). The histogram for 
experimental data is shown as a gray line. The results of one Monte-Carlo trial is shown in 
solid black. There are 2,460 genes whose expression data correlate with ER status at a level 
higher than 0.3 or anti-correlated with ER status at a level lower than -0.3. 

FIG. 5B The distribution of the number of genes that satisfied the same 
30 selection criteria (amplitude of correlation above 0.3) from 10,000 Monte-Carlo runs. It is 
estimated that this set of 2,460 genes reports ER status at a confidence level of p >99.99%. 

FIG. 6 Classification Type 1 and Type 2 error rates as a function of the 
number (out of 2,460) marker genes used in the classifier. The combined error rate is 
lowest when approximately 550 marker genes are used. 

35 
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FIG. 7 Classification of 98 tumor samples as ER(+) or ER(-) based on 
expression levels of the 550 optimal marker genes. ER(+) samples (above white line) 
exhibit a clearly different expression pattern that ER(-) samples (below white line). 

FIG. 8 Correlation between expression levels in samples from each patient 
5 and the average profile of the ER(-) group vs. correlation with the ER(+) group. Squares 
represent samples from clinically ER(-) patients; dots represent samples from clinically 
ER(+) patients. 

FIG. 9A Histogram of correlation coefficients of gene expression ratio of 
each significant gene with the BRCA1 mutation status is shown as a solid line. The dashed 

10 line indicates a frequency distribution obtained from one Monte-Carlo rim. 430 genes 
exhibited an amplitude of correlation or anti-correlation greater than 0.35. 

FIG. 9B Frequency distribution of the number of genes that exhibit an 
amplitude of correlation or anti-correlation greater than 0.35 for the 10,000 Monte-Carlo 
run control. Mean = 1 15. p(n > 430) = 0.48% and p(>430/2) = 9.0%. 

15 FIG. 10 Classification type 1 and type 2 error rates as a function of the 

number of discriminating genes used in the classifier (template). The combined error rate is 
lowest when approximately 100 discriminating marker genes are used. 

FIG. 1 1 A The classification of 38 tumors in the ER(-) group into two 
subgroups, BRCA1 and sporadic, by using the optimal set of 100 discriminating marker 

20 genes. Patients above the white line are characterized by 2?i?C47-related patterns. 

FIG. 1 IB Correlation between expression levels in samples from each ER(-) 
patient and the average profile of the BRCA1 group vs. correlation with the sporadic group. 
Squares represent samples from patients with sporadic-type tumors; dots represent samples 
from patients carrying the BRCA1 mutation. 

25 FIG. 12A Histogram of correlation coefficients of gene expression ratio of 

each significant gene with the prognostic category (distant metastases group and no distant 
metastases group) is shown as a solid line. The distribution obtained from one Monte-Carlo 
run is shown as a dashed line. The amplitude of correlation or anti-correlation of 231 
marker genes is greater than 0.3. 

30 FIG. 12B Frequency distribution of the number of genes whose amplitude of 

correlation or anti-correlation was greater than 0.3 for 10,000 Monte-Carlo runs. 

FIG. 13 The distant metastases group classification error rate for type 1 and 
type 2 as a function of the number of discriminating genes used in the classifier. The 
combined error rate is lowest when approximately 70 discriminating marker genes are used. 

35 
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FIG. 14 Classification of 78 sporadic tumors into two prognostic groups, 
distant metastases (poor prognosis) and no distant metastases (good prognosis) using the 
optimal set of 70 discriminating marker genes. Patients above the white line are 
characterized by good prognosis. Patients below the white line are characterized by poor 
5 prognosis. 

FIG. 15 Correlation between expression levels in samples from each patient 
and the average profile of the good prognosis group vs. correlation with the poor prognosis 
group. Squares represent samples from patients having a poor prognosis; dots represent 
samples from patients having a good prognosis. Red squares represent the 'reoccured' 
10 patients and the blue dots represent the 'non-reoccurred\ A total of 13 out of 78 were mis- 
classified. 

FIG. 16 The reoccurrence probability as a function of time since diagnosis. 
Group A and group B were predicted by using a leave-one-out method based on the optimal 
set of 70 discriminating marker genes. The 43 patients in group A consists of 37 patients 
1 5 from the no distant metastases group and 6 patients from the distant metastases group. The 
35 patients in group B consists of 28 patients from the distant metastases group and 7 
patients from the no distant metastases group. 

FIG. 17 The distant metastases probability as a function of time since 
diagnosis for ER(+) (yes) or ER(-) (no) individuals. 
20 FIG. 18 The distant metastases probability as a function of time since 

diagnosis for progesterone receptor (PR)(+) (yes) or PR(-) (no) individuals. 

FIG. 19 A, B The distant metastases probability as a function of time since 
diagnosis. Groups were defined by the tumor grades. 

FIG. 20 A Classification of 19 independent sporadic tumors into two 
25 prognostic groups, distant metastases and no distant metastases, using the 70 optimal 
marker genes. Patients above the white line have a good prognosis. Patients below the 
white line have a poor prognosis. 

FIG. 20B Correlation between expression ratios of each patient and the 
average expression ratio of the good prognosis group is defined by the training set versus 
30 the correlation between expression ratios of each patient and the average expression ratio of 
the poor prognosis training set. Of nine patients in the good prognosis group, three are from 
the "distant metastases group"; often patients in the good prognosis group, one patient is 
from the "no distant metastases group". This error rate of 4 out of 19 is consistent with 13 
out of 78 for the initial 78 patients. 
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FIG. 20C The reoccurrence probability as a function of time since diagnosis 
for two groups predicted based on expression of the optimal 70 marker genes. 

FIG. 21 A Sensitivity vs. 1 -specificity for good prognosis classification. 

FIG. 21B Sensitivity vs. 1 -specificity for poor prognosis classification. 
5 FIG. 21C Total error rate as a function of threshold on the modeled 

likelihood. Six clinical parameters (ER status, PR status, tumor grade, tumor size, patient 
age, and presence or absence of angioinvasion) were used to perform the clinical modeling. 

FIG. 22 Comparison of the log(ratio) of individual samples using the 
"material sample pool" vs. mean subtracted log(intensity) using the "mathematical sample 
10 pool" for 70 reporter genes in the 78 sporadic tumor samples. The "material sample pool" 
was constructed from the 78 sporadic tumor samples. 

FIG. 23 A Results of the "leave one out" cross validation based on single 
channel data. Samples are grouped according to each sample's coefficient of correlation to 
the average "good prognosis" profile and "poor prognosis" profile for the 70 genes 
15 examined. The white line separates samples from patients classified as having poor 
prognoses (below) and good prognoses (above). 

FIG. 23B Scatter plot of coefficients of correlation to the average expression 
in "good prognosis" samples and '*poor prognosis" samples. The false positive rate (i.e., 
rate of incorrectly classifying a sample as being from a patient having a good prognosis as 
20 being one from a patient having a poor prognosis) was 10 out of 44, and the false negative 
rate is 6 out of 34. . 

FIG. 24A Single-channel hybridization data for samples ranked according to 
the coefficients of correlation with the good prognosis classifier. Samples classified as 
"good prognosis" he above the white line, and those classified as "poor prognosis" he 
25 below. 

FIG. 24B Scatterplot of sample correlation coefficients, with three 
incorrectly classified samples lying to the right of the threshold correlation coefficient value. 
The threshold correlation value was set at 0.2727 to limit the false negatives to 
approximately 10% of the samples. 

30 

5. DETAILED DESCRIPTION OF THE INVENTION 
5.1 INTRODUCTION 
The invention relates to sets of genetic markers whose expression patterns 
correlate with important characteristics of breast cancer tumors. z.e., estrogen receptor (ER) 
35 status, BRCA1 status, and the likelihood of relapse (z\e., distant metastasis or poor 
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prognosis). More specifically, the invention provides for sets of genetic markers that can 
distinguish the following three clinical conditions. First, the invention relates to sets of 
markers whose expression correlates with the ER status of a patient, and which can be used 
to distinguish ER(+) from ER(-) patients. ER status is a useful prognostic indicator, and an 

5 indicator of the likelihood that a patient will respond to certain therapies, such as tamoxifen. 
Also, among women who are ER positive the response rate (over 50%) to hormonal therapy 
is much higher than the response rate (less 10%) in patients whose ER status is negative. In 
patients with ER positive tumors the possibility of achieving a hormonal response is directly 
proportional to the level ER (P. Clabresi and P.S. Schein, MEDICAL ONCOLOGY (2ND ED.), 

10 McGraw-Hill, Inc., New York (1993)). Second, the invention further relates to sets of 

markers whose expression correlates with the presence of BRCA1 mutations, and which can 
be used to distinguish BRCA1 -type tumors from sporadic tumors. Third, the invention 
relates to genetic markers whose expression correlates with clinical prognosis, and which 
can be used to distinguish patients having good prognoses (i.e., no distant metastases of a 

1 5 tumor within five years) from poor prognoses (i.e., distant metastases of a tumor within five 
years). Methods are provided for use of these markers to distinguish between these patient 
groups, and to determine general courses of treatment. Microarrays comprising these 
markers are also provided, as well as methods of constructing such microarrays. Each 
markers correspond to a gene in the human genome, Le., such marker is identifiable as all or 

20 a portion of a gene. Finally, because each of the above markers correlates with a certain 
breast cancer-related conditions, the markers, or the proteins they encode, are likely to be 
targets for drugs against breast cancer. 

5.2 DEFTNTTIONS 

25 As used herein, "BRCA1 tumor" means a tumor having cells containing a 

mutation of the BRCA1 locus. 

The "absolute amplitude" of correlation expressions means the distance, 
either positive or negative, from a zero value; i.e., both correlation coefficients -0.35 and 
0.35 have an absolute amplitude of 0.35. 

30 "Status" means a state of gene expression of a set of genetic markers whose 

expression is strongly correlated with a particular phenotype. For example, "ER status" 
means a state of gene expression of a set of genetic markers whose expression is strongly 
correlated with that of ESR1 (estrogen receptor gene), wherein the pattern of these genes 5 
expression differs detectably between tumors expressing the receptor and tumors not 

3 5 expressing the receptor. 
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"Good prognosis" means that a patient is expected to have no distant 
metastases of a breast tumor within five years of initial diagnosis of breast cancer. 

"Poor prognosis" means that a patient is expected to have distant metastases 
of a breast tumor within five years of initial diagnosis of breast cancer. 
5 'Marker" means an entire gene, or an EST derived from that gene, the 

expression or level of which changes between certain conditions. Where the expression of 
the gene correlates with a certain condition, the gene is a marker for that condition. 

'Marker-derived polynucleotides" means the RNA transcribed from a marker 
gene, any cDNA or cRNA produced therefrom, and any nucleic acid derived therefrom, 
1 0 such as synthetic nucleic acid having a sequence derived from the gene corresponding to the 
marker gene. 



5.3 MARKERS USEFUL IN DIAGNOSIS AND PROGNOSIS OF BREAST CANCER 

5.3.1 MARKER SETS 

1 5 The invention provides a set of 4,986 genetic markers whose expression is 

correlated with the existence of breast cancer by clustering analysis. A subset of these 
markers identified as useful for diagnosis or prognosis is listed as SEQ ID NOS: 1-2,699. 
The invention also provides a method of using these markers to distinguish tumor types in 
diagnosis or prognosis. 

20 In one embodiment, the invention provides a set of 2,460 genetic markers 

that can classify breast cancer patients by estrogen receptor (ER) status; te. 9 distinguish 
between ER(+) and ER(-) patients or tumors derived from these patients. ER status is an 
important indicator of the likelihood of a patient's response to some chemotherapies (i.e., 
tamoxifen). These markers are listed in Table 1 . The invention also provides subsets of at 

25 least 5, 10, 25, 50, 100, 200, 300, 400, 500, 750, 1,000, 1,250, 1,500, 1,750 or 2,000 genetic 
markers, drawn from the set of 2,460 markers, which also distinguish ER(+) and ER(-) 
patients or tumors. Preferably, the number of markers is 550. The invention further 
provides a set of 550 of the 2,460 markers that are optimal for distinguishing ER status 
(Table 2). The invention also provides a method of using these markers to distinguish 

30 between ER(+) and ER(-) patients or tumors derived therefrom. 

In another embodiment, the invention provides a set of 430 genetic markers 
that can classify ER(-) breast cancer patients by BRCA1 status; z. distinguish between 
tumors containing a BRCA1 mutation and sporadic tumors. These markers are listed in 
Table 3. The invention further provides subsets of at least 5, 10 20, 30, 40, 50, 75, 100, 

35 150, 200, 250, 300 or 350 markers, drawn from the set of 430 markers, which also 
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distinguish between tumors containing a BRCA1 mutation and sporadic tumors. Preferably, 
the number of markers is 100. A preferred set of 100 markers is provided in Table 4. The 
invention also provides a method of using these markers to distinguish between2?i?C4i and 
sporadic patients or tumors derived therefrom. 

5 In another embodiment, the invention provides a set of 23 1 genetic markers 

that can distinguish between patients with a good breast cancer prognosis (no breast cancer 
tumor distant metastases within five years) and patients with a poor breast cancer prognosis 
(tumor distant metastases within five years). These markers are listed in Table 5. The 
invention also provides subsets of at least 5, 10, 20, 30, 40, 50, 75, 100, 150 or 200 markers, 

1 0 drawn from the set of 23 1 , which also distinguish between patients with good and poor 
prognosis. A preferred set of 70 markers is provided in Table 6. In a specific embodiment, 
the set of markers consists of the twelve kinase-related markers and the seven cell division- 
or mitosis-related markers listed. The invention also provides a method of using the above 
markers to distinguish between patients with good or poor prognosis. 

15 
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Table 1. 2,460 gene markers that distinguish ER(+) and ER(-) cell samples. 
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AbUool 1 1 


cpn m mo 7n 

ocU IU INU f U 


NM. 


_0 13230 


SEQ ID NO 1417 

vXL_V*C 1 LX 1 N VX IT 1 f 


AbUooUoo 


cpn in mo 71 

otU IU INU / I 


NM, 


013233 


SEQ ID NO 1418 

VXL_ V*( 1 LX 1 >l VX IT IV/ 


A DHQWOn 

AbUo/ /VU 


cpA in MO 70 
OlU IU INU / Z 


NM 


013238 


SEQ ID NO 1419 

VX L— VX 1 LX I w VX IT 1 w 




CCA in MA 74 
otU IU INU /t 


NM. 


013239 


SEQ ID NO 1420 

UL. V*c 1 LX INVx 1 Tfc w 


AB037745 


CCA in MA 7R 
obU IU INU fZ> 


NM. 


_013242 


ceo m NO 1421 


AD0^77^fi 


oca m NO 7fi 

OCW IU INS-/ #U 


NM. 


_013257 


SEQ ID NO 1423 


AB037765 


SEQ ID NO 77 


NM. 


013261 


SEQ ID NO 1424 


AB037778 


SEQ ID NO 78 


NM. 


013262 


SEQ ID NO 1425 


AB037791 


SEQ ID NO 79 


NM. 


_013277 


SEQ ID NO 1426 


AB037793 


SEQ ID NO 80 


NM 


013296 


SEQ ID NO 1427 
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SEQ ID NO 


GenBank 


SEQ ID NO 


Mccessiun iNumuer 




Accession Number 




MDUO/ OUZ 


ccn in MO ft1 

OCU 1 U IMw O 1 


NM_ 


.013301 


ccn in MO 149ft I 


MDUO/ OUD 


epn in MO 89 

OlU 1 U INU 0£ 


NM_ 


.013324 


QFO in NO 149Q 




QPO in MO 

OlU 1 INU OO 


NM_ 


.013327 


QPO ID MO 14^0 

OlU IU INU i tOU 


nDUO f OOO 


cpn in mo &a 

OlU IU iNU Ot 


NM_ 


.013336 


ccn m MO 14^1 

OlU IU INU 1 tO 1 


ARn^7ft44 
nDUo f 0*ft 


QPO in MO ft^ 

OtU IU IMU Oj 


NM. 


.013339 


QFO in MO 14^9 

OlU IU INU ItO^i 


MDUO f O^t-O 


QPO in MO ftfi 
OIZU IU INU OO 


NM_ 


.013363 


cpn |n MO 14*V* 

OlU IU INU I tOO 


MdUO/0*K5 


QPO in MO ft7 
otU IU INU Of 


NM_ 


.013378 


QPO in MO 14^ 
OlU IU INU I tOO 


MDUOf ODO 


QPO in MO ftA 
OCU IU l\IU OO 


NM_ 


.013384 


cpn in mo 14^r 

OlU IU INU I tOU 


MDUOi OD*f 


QPO in MO AQ 


NM. 


.013385 


QPO in MO 14^7 ! 
OlU IU INU ItO/ 


ARfY4HAA1 


qpo in mo on 


NM_ 


.013406 


cpn m MO 14^ft 
OLU iu imvj i too 


MDU^t-UyUU 


qpo in MO Q1 
olu iu iNu y 1 


NM_ 


.013437 


QFO in MO 14^Q 
OlU IU INU 1 tO£7 


MDU^fUy It 


QPO in MO Q9 

ocu iu nu yz 


NM_ 


.013451 


QPO in MO 1 AACi 
OlU IU INU IttU 


ADUtUy^O 


qpo in MO 
otu iu iNvj yo 


NM_ 


.013943 


QPO in MO 1441 
OlU IU INU I tt I 


ADUtuyoo 


QPO in MO OA 
otU IU INU y*r 


NM_ 


.013994 


QPO in MO 1449 
OlU IU INU I ttZ. 




qpo in MO 
otu iu inu yo 


NM. 


.013995 


QPO in MO 144^ 
OlU IU INU l*HO 


Mruuuy / *f 


QPO in MO Q7 
OlU IU INVJ y / 


NM. 


.014026 


QPO in MO 1444 
OlU iU INU I ttt 


APnrm4A7 


QPO in MO Oft 
OlU IU INVJ yo 


NM. 


.014029 


QPO in MO 144^ 
OlU IU INU I ttO 


APHH71 
Mr UUf I OO 


QPO in MO QQ 

ocu iu inu yy 


NM. 


.014036 


QPO in MO 1 44fi 
OlU IU INU I ttO ! 


ArUU/ IOO 


qpo in mo inn 

OlU IU INVJ IUU 


NM. 


.014062 


QPO in MO 1447 
OlU IU INU 


APm ^n>ii 


QPO in MO 1A1 
OlU IU INVJ IU 1 


NM. 


.014074 


QPO in MO 144R 
OlU IU INU lt*fO 


Mr uiouu** 


QPO in MO 1H9 
OlU IU INU IUZ 


NM. 


014096 


qpo in mo i4^n 

OlU IU INU ItOU 


MrUT tH-yo 


qpo in mo in** 

OlU IU INVJ IUO 


NM. 


014109 


QPO in MO 14*^1 
OlU IU INU I40 I 


Mru-cuyiy 


QPO m MO 1H4 
otU IU INVJ lU^I- 


NM. 


.014112 


QPO in MO 14*^9 
OPU IU INU l^fOZ 


Mr UZoytl 


qpo in MO 10^ 

OlU IU INVJ IUO 


NM. 


.014147 


QPO in MO 14*^* 
OlU IU INU ItOO 


ACH^R'l Q1 

Aruoo i y i 


qpo in mo mft 

OlU IU INVJ IUO 


NM. 


.014149 


QPO in MO 14*^1 
OlU IU INU ItO** 


ACn^KOQA 

ArUoOZo^l' 


QPO IO MO 4(Y7 
OlU IU INVJ lU/ 


NM. 


.014164 


QPO in MO 
OlU IU INU I tOO 


ArUoOolo 


QPO m MO 1HQ 
OtU IU INVJ IUO 


NM. 


.014172 


QPO m MO 14Clfi 
OlU IU INU l^fOO 


Acn^ftifto 
ArUoo i oZ 


qpo m mo ino 

OlU IU INVJ IUy 


NM. 


.014175 


QPO in MO 14*V7 
OlU IU INU ItO/ 


Aruooi yo 


qpo in mo iin 

otU IU INVJ 1 IU 


NM. 


.014181 


QPO in MO 14*\ft 
OlU IU INU l^i-OO 


APH49ft**A 
Mr U^OOO 


qpo in MO 111 
OlU IU INVJ 1 1 1 


NM. 


.014184 


QPO in MO 14^Q 
OlU IU INU I toy 


APH441 07 
MrU*Mf I Z / 


QPO in MO 119 
OlU IU INVJ I IZ 


NM. 


.014211 


qpo in mo i4fin 

OlU IU INU ItOU 


APH4^99Q 
Mr Ut O^^y 


QPO in MO 11^ 
OlU IU INVJ I IO 


NM. 


.014214 


QPO in MO 14R1 
OlU IU INU I tO 1 


Ml Ut f UU^ 


QPO in MO 1 14 

OlU IU INVJ 1 It 


NM. 


.014216 


^FO in MO 14R9 

OlU IU INU ItUfc 


AF047826 


SEQ ID NO 115 


NM. 


.014241 


SEQ ID NO 1463 


AF049460 


SEQ ID NO 116 


NM. 


.014246 


SEQ ID NO 1465 


AF052101 


SEQ ID NO 117 


NM. 


.014268 


SEQ ID NO 1466 j 


AF052117 


SEQ ID NO 118 


NM 


014272 


SEQ ID NO 1467 
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A ooQocInn Mi i mKor 
ACCGSolUfi JnUiIIJJwJ 


SEQ ID NO 


GenBank 

Accession Number 


SEQ ID NO 


apa^oi 


ecn in mo 119 


NM_ 


.014274 


SEQ ID NO 1468 


APH^21 


<5FO ID NO 120 


NM_ 


.014289 


SEQ ID NO 1469 




oca in NO 122 


NM_ 


014298 


SEQ ID NO 1470 


APfi^Olft^ 


ccn ID NO 123 

ULVX 1 U/ 1 XV I ^w 


NM_ 


014302 


SEQ ID NO 1471 


r\i UOj^r V/ 


ccn in NO 126 


NM_ 


014315 


SEQ ID NO 1473 


MrUOOU / o 


epr> m NO 127 


NM_ 


.014316 


SEQ ID NO 1474 


MrUD IUo*f 


ecn m NO 198 


NM_ 


014317 


SEQ ID NO 1475 


MrUDO/ Zo 


ccn m NO 12Q 

OuW IL-J INU l^.w 


NM_ 


.014320 


SEQ ID NO 1476 ' 


AruooyoD 


cca m NO 130 


NM_ 


014321 


SEQ ID NO 1477 


apaa^o/m 


oca m NO 131 i 

OuW 1 U/ INkJ iO 1 


NM_ 


014325 


SEQ ID NO 1478 


APA£n7QTO 


oca m NO 132 


NM_ 


014335 


SEQ ID NO 1479 


nrU( UOOO 


oca m NO 133 

OuW 1 U/ INU Iw 


NM_ 


.014363 


SEQ ID NO 1480 


Aru/uooz 


oca m NO 134 

OUW 1 L/ INv Ivrt 


NM_ 


.014364 


SEQ ID NO 1481 


ArU/UDi f 


oca m NO 13^ 


NM_ 


.014365 


SEQ ID NO 1482 


apav*7*7A 
ArUf of / u 


oca m NO 13ft 


NM_ 


.014373 


SEQ ID NO 1483 


ArU# DDl^ 


oca m NO 13Q 
ocu id \ ioy 


NM_ 


_014382 


SEQ ID NO 1484 


Art) / yo^y 


oca m NO 140 


NM_ 


_0 14395 


SEQ ID NO 1485 


Acnonoi q 

Aruyuyio 


oca m NO 14.2 


NM_ 


_0 14398 


SEQ ID NO 1486 


Aruyo/ iy 


qca m NO 143 

OCU I U INU I *tO 


NM_ 


_014399 


SEQ ID NO 1487 


ArUyOvn 1 


oca m NO 144 


NM_ 


_014402 


SEQ ID NO 1488 


APnoQn39 


cca m NO 145 


NM_ 


_014428 


SEQ ID NO 1489 


api nn7Rfi 


qcn m NO 146 

OuW 1 us IN ItU 


NM. 


014448 


SEQ ID NO 1490 


AC4 ni a^i 


CiPO in NO 147 

OuW 1 Us INV-/ l*ti 


NM. 


014449 


SEQ ID NO 1491 


Ar lUooY 0 


oca m NO 1Aft 

OuW IL/ IN V-/ I H-O 


NM. 


_014450 


SEQ ID NO 1492 


A Ci AQ/I 

Ar iUo40o 


oca in MO 14.Q 
OuW IL/ InvJ IH-C7 


NM. 


_0 14452 


SEQ ID NO 1493 


ArluoOoU 


oca m NO 1R0 

OuW 1 LJ W\J 1 OU 


NM. 


014453 


SEQ ID NO 1494 


Ar i UOOU4 


oca m NO 1^1 

OuVftC IL/ INv IO 1 


NM. 


_0 14456 


SEQ ID NO 1495 


A Pi 1 1 QAQ 

Ar i I lony 


oca m NO 1^2 

OCW IL/ INVy 1 Jc 


NM. 


014479 


SEQ ID NO 1497 


ACi i OOi Q 

Arl 1 To 


oca in NO 1*^3 

OCW IL* INW loo 


NM 


014501 


SEQ ID NO 1498 


A Ci i Q<1 QO 

Ar i ToloZ 


cca m NO 1 RA 
OCW I LJ iNvJ I o*r 


NM 

1 Nl VI 


014552 

W 1 ^WWfc 


SEQ ID NO 1500 


Ar 1 1 6682 


OCA in MA i 
OuW IU INVJ lOD 


NM. 


014553 


SEQ ID NO 1501 


AF1 18224 


OCA 1 r\ MA i C7 

ocU IU IMU 10/ 


NM. 


_014570 


SFO ID NO 1502 i 

OUVX IU/ IXV 1 ww^. 


AP1 1 A274 
Ar 1 1 OZ / 


cca m NO 158 

OUW IL/ 1 NV-/ 1 w 


NM. 


014575 


SEQ ID NO 1503 


AF1 19256 


SEQ ID NO 159 


NM. 


_014585 


SEQ ID NO 1504 


AF1 19665 


SEQ ID NO 160 


NM. 


_014595 


SEQ ID NO 1505 


AF121255 


SEQ ID NO 161 


NM. 


_014624 


SEQ ID NO 1507 


AF131748 


SEQ ID NO 162 


NM 


014633 


SEQ ID NO 1508 
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SEQ ID NO 


GenBank 


SEQ ID NO 


AAwwOwolwll InUIIIL/wI 




Accession Number 




AP1^17R^ 
r\i IO 1 / ww 


^FO ID NO 1fi3 

OtVV 1 1/ INV/ 1 WW 


NM_ 


.014640 


SEQ ID NO 1509 

VU\i( 1 1/ 1 i\/ • www 


AP1 ^17RO 


ccn in MO 1R4 

OlW IL/ Vh\J 1 W*T 


NM_ 


014642 


SEO ID NO 1510 

IL/ IMV 1 W 1 W 


AP1**17ft4 
Mr I w 1 / 


ccn in MO 1R^ 

OCU I L/ IM V-/ I ww 


NM_ 


.014643 


SEO ID NO 151 1 

V/L.VaC 11/ INw 1 w 1 1 


AP1**1ft9ft 


ccn in NO 1RR 
Ou\K il/ imu iou 


NM_ 


014656 


SEO ID NO 1512 

WUVX II—/ 1 >• \-S IW 1 


MP I OU I DO 


ccn ID NO 1R7 

OL-Vac IL/ INw Ivi 


NM_ 


.014668 


SEO ID NO 1513 

wLVic IL/ INV/ 1 W 1 W 


AP1 A1ftft9 


cpn in NO 1R8 


NM_ 


.014669 


SEQ ID NO 1514 

lis/ INV/ 1 W I ~T 


Mi l*+OwUw 


SFO ID MO 1RQ 


NM_ 


.014673 


SEQ ID NO 1515 

V*PL_VaC IL/ t IW IW IW 


AP1 AQ78^ 
Mr I / Ow 


SFO IO NO 170 
OCU IL/ inW 1 / \J 


NM_ 


.014675 


SEQ ID NO 1516 

UUVX 1 1/ 1 TiV 1 W 1 w 


api main 

MP IO IO IU 


ccn in NO 171 

OCVx IL/ INV 1 f 1 


NM_ 


.014679 


SEQ ID NO 1517 


AP1 t\ORC\0 
MP 1 OiiDUZ 


<^FO in NO 179 

OlU IL/ INU 1 f *C 


NM_ 


.014680 


SEO ID NO 1518 

V/L- Vac IL/ INW Iv 1 U 


Mr i i 


ccn IO NO 174 

OLW IL/ INv 1 ( t 


NM_ 


.014696 


SEQ ID NO 1519 

v/ l__ VaC IL/ 1 in/ 1 W IW 


API f^QCiOO 
Mr i oyuyz 


cpn m NO 17*5 

OUU IL/ INU 1 f W 


NM. 


.014700 


SEQ ID NO 1520 

V/LaVx IL/ INV,/ 1 \J £—\J 


AP1R14H7 

Mr I O I *rU I 


ccn m NO 17R 

OCW IL/ l\w 1 / w 


NM_ 


014715 


SEQ ID NO 1521 

V/ L— VaC IL/ INV/ 1 Wx» 1 


Mr i o i www 


SFO ID NO 177 

0 1 — VaC 11/ 1 M V-/ Iff 


NM_ 


.014721 


SEQ ID NO 1522 

WLmSk IL/ 1 lV 1 W«mA» 


AF1R41ft4 
Mr 1 UH* 1 ut 


SFO ID NO 178 


NM_ 


.014737 


SEQ ID NO 1524 

Va? L_ VaC 11/ 1 iV I \J£—TT 


AP1R77HR 
Mr luf / uo 


ccn m NO 17Q 

OuW IL/ INv 1 / w 


NM_ 


.014738 


SEQ ID NO 1525 

V/L.VX IL/ INv/ 1 \J£—\J 


AP17*V*ft7 
Mr i # uoo # 


cpn m NO 180 

OuW IL/ INv IOU 


NM_ 


.014747 


SEQ ID NO 1526 

V/L— VaC IL/ INV/ 1 v/fcU 


AP17RH19 
Mr i # ou i z 


cpn m NO 181 

OCVx IL/ INW IO 1 


NM. 


014750 


SEQ ID NO 1527 

V/L- VaC IL/ INv/ \\J4Zml 


AF 186780 

AAi 1 Ou # ou 


SFO ID NO 189 

ULVx IL/ 1 N V/ 


NM. 


.014754 


SEQ ID NO 1528 

V/ L_VaC 11/ 1 »V 1 vfcv 


AF217*»fl8 


SEQ ID NO 184 


NM_ 


.014767 


SEQ ID NO 1529 

V/ L»NX IL/ 1 IW I V/£a\/ 


AF2204Q? 


SFO ID NO 185 

VL.Ad( 1 1/ 1 NV/ 1 Uv 


NM_ 


.014770 


SEQ ID NO 1530 

*Wl_ Vac 11/ 1 tIV/ 1 WV/W 


AF9949RR 

AAI il./l*T^wW 


SEO ID NO 186 


NM_ 


.014773 


SEQ ID NO 1531 

V/ L« Vac il/ i »v i wv/ i 


AF9^non4. 

r\i «£wwwU*r 


SFO ID NO 187 

OLW IL/ INv IOi 


NM. 


.014776 


SEQ ID NO 1532 

V/L~Vac IL/ INV/ 1 ww^. 




SFO ID NO 188 

OuU IL/ INV/ IOC? 


NM. 


.014782 


SEQ ID NO 1533 

V/ L. Vac 11/ INV/ 1 Www 


AP9*V717^ 
Mi -£w f I/O 


cpn m NO 18Q 

OCU IL/ INU I Ow 


NM. 


.014785 


SFO ID NO 1534 

V/ L^VaC 11/ INV/ 1 Ww*T 


AP0^7R^Q 


ccn m NO 1Q0 

OCV4 IL/ INU I wU 


NM. 


.014791 


SFO ID NO 1^^ 

V/ L— VaC 11/ INv/ 1 WWW 


AF979^*\7 
r\\ £- 1 ZOO/ 


cpn m NO 1Q1 

OuVj( IL/ INU I v7 1 


NM. 


.014808 


SEQ ID NO 1536 

V/ L. VaC 11/ INV/ 1 WWW 


AF97Q8R* 1 , 
nr^ f wOww 


SFO ID NO 192 

WL.VH 11/ INW I W*w 


NM. 


.014811 


SEQ ID NO 1537 

V/L_VaC 11/ INV/ 1 WW I 


AI4Q7fi*V7 

1 ww / r\v 


SFO ID NO 193 


NM. 


.014812 


SEQ ID NO 1538 

v/ L_ Vac 11/ inv/ t www 


A I0 197*5*5 
r\o\j \ WW 


SFO ID NO 194 

V/L- Vac IL/ liV 1 W"T 


NM. 


.014838 


SEQ ID NO 1540 

v/&a»vac it/ i iv i w^rw 


/"W.£»£wwww 


SFO ID NO 195 

OLW IL/ liv 1 wW 


NM. 


.014862 


SEQ ID NO 1542 

V/L_VaC 11/ INV/ 1 wT£. 


A 1994741 


SFO ID NO 1QR 

OCvjC IL/ INw 1 wvJ 


NM. 


.014865 


SEQ ID NO 1543 

V/ L_ Va< 11/ INV/ 1 wT*W 


AJ9948R4 


SEQ ID NO 197 


NM. 


.014870 


SEQ ID NO 1544 

W Lb VaC 1 1/ 1 ^1 V/ 1 V/ ■ ■ 


AJ225092 


SEQ ID NO 198 


NM. 


.014875 


SEQ ID NO 1545 


AJ225093 


SEQ ID NO 199 


NM. 


.014886 


SEQ ID NO 1547 \ 


AJ249377 


SEQ ID NO 200 


NM. 


.014889 


SEQ ID NO 1548 


AJ270996 


SEQ ID NO 202 


NM 


014905 


SEQ ID NO 1549 
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Arv*^QQinn Numhftr 

/AwvCOOIUI 1 INUII IMOI 


SEQ ID NO 


GenBank 

Accession Number 


obU IU NU 


AJ272057 


SEQ ID NO 203 


NM_ 


.014935 


SEQ ID NO 1550 


AJ275978 

k i WW I v 


SEQ ID NO 204 


NM_ 


014945 


SEQ ID NO 1551 


AJ276429 

/\v/ ^- f w i w 


SEQ ID NO 205 


NM_ 


014965 


SEQ ID NO 1552 


AK000004 

/\l XW WVW W~ 


SEQ ID NO 206 


NM_ 


014967 


SEQ ID NO 1553 


AK000005 


SEQ ID NO 207 


NM_ 


014968 


SEQ ID NO 1554 


aIaUUU 1 uu 


SEQ ID NO 208 


NM_ 


.015032 


SEQ ID NO 1555 


ni\UUU 1*+^. 


SEQ ID NO 209 


NM_ 


015239 


SEQ ID NO 1556 


AkTinnifift 

nrxUUU IUO 


SEQ ID NO 210 


NM_ 


.015383 


SEQ ID NO 1557 


/Alxw w ww*t*-J 


SEQ ID NO 212 


NM_ 


.015392 


SEQ ID NO 1558 




SEQ ID NO 213 


NM_ 


015416 


SEQ ID NO 1559 




SEQ ID NO 214 


NM_ 


015417 


SEQ ID NO 1560 


akooo643 


SEQ ID NO 216 

^/ \a>C 1 l_S 1 lV IW 


NM_ 


015420 


SEQ ID NO 1561 


AKOOOfifiO 
nixuuuuuv/ 


SEQ ID NO 217 


NM_ 


015434 


SEQ ID NO 1562 


/Alxw wwwww 


SEQ ID NO 218 

wuVk IL/ mw IW 


NM_ 


.015474 


SEQ ID NO 1563 




SEQ ID NO 220 


NM_ 


015507 


SEQ ID NO 1565 


/Ai\w\Jwwww 


SEQ ID NO 221 


NM_ 


_015513 


SEQ ID NO 1566 




SEQ ID NO 223 


NM_ 


_015515 


SEQ ID NO 1567 


AK001164 


SEQ ID NO 224 


NM_ 


_0 15523 


SEQ ID NO 1568 


AK001166 


SEQ ID NO 225 

Va» ^— NX 1 W 1 •W*»W 


NM_ 


_015524 


SEQ ID NO 1569 


AK001295 


SEQ ID NO 226 

N»a» La» \JK, IV ■ ^1 C>t>V 


NM_ 


_015599 


SEQ ID NO 1571 


AK001^80 
nr\uu i www 


SEQ ID NO 227 


NM_ 


_015623 


SEQ ID NO 1572 j 


AkT)0149^ 
nr\UU I*t^O 


SEQ ID NO 228 

wL.vc 1 La* J iv ■ ■ * * 


NM_ 


_015640 


SEQ ID NO 1573 


AkTlfM A^ft 
rMvUU l*rOO 


SEQ ID NO 229 


NM_ 


_015641 


SEQ ID NO 1574 


a vcnrs'iAQO 

nl\UU 1 *fv7il 


SEQ ID NO 230 


NM_ 


_015678 


SEQ ID NO 1575 


AKOOIAQQ 

r\i\UU I*tC/w 


SEQ ID NO 231 


NM. 


_015721 


SEQ ID NO 1576 


AK001fi^0 
AiWJu i w ww 


SEQ ID NO 232 


NM. 


,015892 


SEQ ID NO 1578 


AK001872 


SEQ ID NO 234 

\J L_ VaK 1 La/ 1 IV/ fc*/T 


NM. 


_015895 


SEQ ID NO 1579 


nrxUU 1 wwW 


SEQ ID NO 235 

VaPLnVkC 1 La/ 1 X V-/ fcWW 


NM. 


_015907 


SEQ ID NO 1580 




^FO ID NO 236 

1 La/ IN W ^WW 


NM. 


_015925 


SEQ ID NO 1581 


a i/nnonftft 
AKUU^UOO 


cpn in NO 9^7 

OL-vx IU INw ^v/ / 


NM. 


_015937 


SEQ ID NO 1582 






NM. 


015954 


SEQ ID NO 1583 




cpn in MO 241 

OLW 1 La/ INVV £m*T 1 


NM. 


_015955 


SEQ ID NO 1584 


ai 049265 

. w » w^w w 


SEQ ID NO 242 


NM. 


_015961 


SEQ ID NO 1585 


AL049365 


SEQ ID NO 244 


NM. 


_0 15984 


SEQ ID NO 1587 


AL049370 


SEQ ID NO 245 


NM. 


_015986 


SEQ ID NO 1588 


AL049381 


SEQ ID NO 246 


NM. 


_015987 


SEQ ID NO 1589 


AL049397 


SEQ ID NO 247 


NM 


015991 


SEQ ID NO 1590 
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GenBank 


otU IU rt\J 


Accession Number 




Accession Number 




AL049415 


SEQ ID NO 248 


NM_ 


.016002 


SEQ ID NO 1592 


AL049667 


SEQ ID NO 249 


NM_ 


.016028 


SEQ ID NO 1594 


AL049801 


SEQ ID NO 250 


NM_ 


.016029 


SEQ ID NO 1595 


AL049932 


SEQ ID NO 251 


NM_ 


.016047 


SEQ ID NO 1596 


AL049935 


SEQ ID NO 252 


NM_ 


.016048 


SEQ ID NO 1597 


AL049943 


SEQ ID NO 253 


NM_ 


.016050 


SEQ ID NO 1598 


AL049949 


SEQ ID NO 254 


NM_ 


.016056 


SEQ ID NO 1599 


AL049963 


SEQ ID NO 255 


NM. 


.016058 


SEQ ID NO 1600 


AL049987 


SEQ ID NO 256 


NM_ 


.016066 


SEQ ID NO 1601 


AL050021 


SEQ ID NO 257 


NM_ 


.016072 


SEQ ID NO 1602 


AL050024 


SEQ ID NO 258 


NM_ 


.016073 


SEQ ID NO 1603 


AL050090 


SEQ ID NO 259 


NM_ 


.016108 


SEQ ID NO 1605 


AL050148 


SEQ ID NO 260 


NM_ 


.016109 


SEQ ID NO 1606 


AL050151 


SEQ ID NO 261 


NM_ 


.016121 


SEQ ID NO 1607 


AL050227 


SEQ ID NO 262 


NM_ 


.016126 


SEQ ID NO 1608 


AL050367 


SEQ ID NO 263 


NM_ 


.016127 


SEQ ID NO 1609 


AL050370 


SEQ ID NO 264 


NM_ 


.016135 


SEQ ID NO 1610 


AL050371 


SEQ ID NO 265 


NM_ 


.016142 


SEQ ID NO 1612 


AL050372 


SEQ ID NO 266 


NM_ 


.016153 


SEQ ID NO 1613 


AL050388 


SEQ ID NO 267 


NM. 


.016171 


SEQ ID NO 1614 


AL079276 


SEQ ID NO 268 


NM. 


.016175 


SEQ ID NO 1615 


AL079298 


SEQ ID NO 269 


NM. 


.016184 


SEQ ID NO 1616 


AL080079 


SEQ ID NO 271 


NM. 


.016185 


SEQ ID NO 1617 


AL080192 


SEQ ID NO 273 


NM. 


.016187 


SEQ ID NO 1618 


AL080199 


SEQ ID NO 274 


NM. 


.016199 


SEQ ID NO 1619 


AL080209 


SEQ ID NO 275 


NM. 


.016210 


SEQ ID NO 1620 


AL080234 


SEQ ID NO 277 


NM. 


.016217 


SEQ ID NO 1621 


AL080235 


SEQ ID NO 278 


NM. 


.016228 


SEQ ID NO 1623 


AL096737 


SEQ ID NO 279 


NM. 


.016229 


SEQ ID NO 1624 


AL110126 


SEQ ID NO 280 


NM. 


.016235 


SEQ ID NO 1625 


AL1 10139 


SEQ ID NO 281 


NM. 


.016240 


SEQ ID NO 1626 j 


AL1 10202 


SEQ ID NO 283 


NM. 


.016243 


SEQ ID NO 1627 


AL1 10212 


SEQ ID NO 284 


NM. 


.016250 


SEQ ID NO 1628 


AL1 10260 


SEQ ID NO 285 


NM. 


.016267 


SEQ ID NO 1629 


AL1 17441 


SEQ ID NO 286 


NM. 


.016271 


SEQ ID NO 1630 


AL1 17452 


SEQ ID NO 287 


NM. 


.016299 


SEQ ID NO 1631 j 


AL1 17477 


SEQ ID NO 288 


NM 


016306 


SEQ ID NO 1632 
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SEQ ID NO 


GenBank 


SEQ ID NO 


MCCwoblUII INUIIIUtJI 
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Al i i7^(V> 
ALT I ( D\J£. 


qprj in NO 9RQ 


NM_ 


016308 


SEQ ID NO 1634 


AHi 7KOO 


ccn ID NO 2Q0 

OC_Vj« II-/ INu £-<J\J 


NM_ 


016321 


SEQ ID NO 1635 


Alii 7*>Q* : % 

alt i ( oyo 


cpn in NO 9Q1 


NM_ 


.016337 


SEQ ID NO 1636 


Alii 7^0Q 

al i i ( oyy 


qpn in NO 9Q2 


NM_ 


.016352 


SEQ ID NO 1637 


Alii 7ftnn 

ALT 1 / OUU 


qfo in Mn i 


NM_ 


016359 


SEQ ID NO 1638 


Al i i 7RnQ 
ALT I / DUy 


oca in mo 9Q4 


NM_ 


.016401 


SEQ ID NO 1641 


Alii 7fii 7 
AL llfDIf 


opn in NO 2Q5 

OCvk It-/ 11V ^Zf\J 


NM_ 


016403 


SEQ ID NO 1642 


Al i i7fififi 
AL 1 1 ( ODD 


qpn m NO 296 


NM_ 


016411 


SEQ ID NO 1643 


Al ioon^ 

AL IZZUOO 


qpn in NO 9Q7 ! 

OCVj( IU INU £-<J * 


NM_ 


016423 


SEQ ID NO 1644 


Al i ^^n*^^ 

AL I OOKJOO 


qFO in NO 2Q8 


NM 


016463 


SEQ ID NO 1647 


AL loOuoO 


qpn in NO 2QQ 


NM_ 


016475 


SEQ ID NO 1649 


A 1 i A7A 
ALlOOU/^f 


qFO in NO ^01 

OCvk 11-/ INU Ow 1 


NM_ 


016477 


SEQ ID NO 1650 


Al i ***3AQfi 

alt oouyo 


QFO in NO ^09 

OCU 1 LJ INv Out 


NM_ 


016491 


SEQ ID NO 1651 


Al iQQifl^ 
ALT OO 1 UO 


qFO m NO 


NM_ 


_016495 


SEQ ID NO 1652 


Al i^QiOA 
AL 1 Ool UO 


QFO in NO ^04 

OuU IU INU Out 


NM_ 


_016542 


SEQ ID NO 1653 


Al 11Q^70 
ALTooO/ Z 


qpn in no 


NM_ 


_0 16548 


SEQ ID NO 1654 


Al iQQAiQ 
AL 1 OOO i y 


qpn in NO ^07 


NM_ 


_016569 


SEQ ID NO 1655 


AL lOODZZ 


oca m NO ^Oft 
OLU 1 U INU OUO 


NM_ 


_016577 


SEQ ID NO 1656 


Al i 

ALT ooOZo 


qnn in NO ^0Q 

OlU J U INU OUc7 


NM_ 


_0 16582 


SEQ ID NO 1657 


ALT OODZ^- 


qpn m no ^10 

OCU 1 LJ INU O 1 \J 


NM 


_0 16593 


SEQ ID NO 1658 


AL lOODOZ 


qpn m no ^1 1 

OuU I U INU O I 1 


NM_ 


_016603 


SEQ ID NO 1659 


Al i1Q£/iA 

AL lOOO^Kf 


oca in NO ^19 

OCU IU INU O I £- 


NM_ 


_016612 


SEQ ID NO 1660 


A 1 i 

ALT ooD^-O 


qpA m NO 

OuU 1 LJ INU O IO 


NM 


016619 


SEQ ID NO 1661 


Al iQ^At^i 
AL lOOOOl 


QFO in NO **14 
olU IU INU O l*t 


NM. 


016623 


SEQ ID NO 1663 


A 1 i Q7Q i A 

ALIo/o IU 


OtZW IU INU O IO 


NM 


016625 


SEQ ID NO 1664 


A 1 i Q*7Qi CI 

ALIo/olO 


QFO in NO ^17 
OCU IU INU O I f 


NM 


016629 


SEQ ID NO 1665 


Al 4 07000 

AL1 ofooZ 


oca in MO ^i P 
OCU IU INU O IO 


NM 


016640 


SEQ ID NO 1666 


Al -107*3^0 

ALT o/ o4Z 


QFO in MO ^1 Q 

otu iu inu o iy 


NM 


016645 


SEQ ID NO 1667 


Al 4Q7QCO 


QFO in MO **91 
OCU IU INU O/L I 


NM. 


016650 


SEQ ID NO 1668 i 


Al iQTQQi 

ALT or oo i 


QFO in MO ^99 
otZW IU INU OsL£. 


NM. 


_0 16657 


SEQ ID NO 1669 

f \X ■ l » x^ ■ x* %^ 


Al iQ"7>IA7 

AL IO/4U/ 


QFO in MO ^9^ 
OuW IU INU OZO 


NM. 


_016733 


SEQ ID NO 1670 


ALT Of 44o 


QFO in MO ^94 
ulU IU INU 0<t*t 


NM. 


_016815 


SEQ ID NO 1671 


Al 1^7^02 


SEQ ID NO 326 


NM. 


_016817 


SEQ ID NO 1672 


AL137514 


SEQ ID NO 327 


NM. 


_016818 


SEQ ID NO 1673 


AL1 37540 


SEQ ID NO 328 


NM. 


_016839 


SEQ ID NO 1675 


AL1 37566 


SEQ ID NO 330 


NM. 


_017414 


SEQ ID NO 1676 


AL137615 


SEQ ID NO 331 


NM. 


017422 


SEQ ID NO 1677 
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SEQ ID NO 


GenBank 
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SEQ ID NO 


Al 1^"7ft7Q 
ALTo/ Of O 


opo in mo 

uLU IU InU OOO 


NM_ 


.017423 


<sF=n in mo ifi7ft 


ML. 1 0 r # 1 o 


oca in MO **^R 


NM_ 


017447 


epn in NO 1fi7Q 


ML I O / / OO 


ocn in MO ^7 


NM_ 


.017518 


*SFO in NO IfiftO 


ML. lOf rul 


°.FO in MO ^R 

OLU IL/ IMw OOO 


NM. 


.017522 


SFO ID NO 1681 


Al 1**77R1 
MU lOf / O I 


ccn m MO 

OL_W 1 L/ IM V-/ OOO 


NM_ 


.017540 


RFO ID NO 16R9 

OLU IU INV 1 \J\J^ 


ML 1 O f 40 1 


qcn in MO ^10 

OL_W 1 l_/ INU O^tVJ 


NM_ 


.017555 


QCA ID NO Ififl^ 

OCU ILy INU 1 UOJ 


Al 1^7/f50 
AL ID/ 4oZ 


opn in mo 

uLU IL/ IMU 04 I 


NM_ 


.017572 


oca m MO 1RR4. 

OlU IL/ IMU 1 OOt 


AL10/4D4 


Qcn in MO 1AO 
otU IU iMU 04*£ 


NM. 


.017585 


oca m MO IRK^ 

OlU IL/ IMU 1 OOO 


A 1 *i K"7/1 7£ 
AL10/4/0 


ocn in mo iai 
OtU IU PMU o4o 


NM_ 


.017586 


oca m MO 1RRR 
OlU IL/ INU 1 OOO 


ML l O/ 4oU 


ocn in MO 
oLU IU IMU OH-^f 


NM. 


.017596 


ocn in mo 1RR7 

OtU IL/ IMU IOO/ 


ML I O / 404 


oca in MO ^KA^k 
OCU IU INv 0*fO 


NM. 


.017606 


oca m MO 1RRR 

OlU IL/ INU 1 OOO 


A I i^7AR4 
ML I D / 404 


oca m MO LAG 

OCU iL/ INv OtO 


NM. 


.017617 


oca m NO 1RRQ 

OCU 1 L/ IMU 1 UOw 


ML I O / 


oca m MO ^A7 
OlU IU IMU 04/ 


NM. 


.017633 


oca m NO 1RQ0 

OCU IL/ INU 1 UC7U 


Al 1 R7^HR 
ALT Of DUO 


oca m MO 1A9K 
OCU IU INU 040 


NM. 


.017634 


oca m NO 1RQ1 
ocu IL/ inu iuy 1 


AL ID / oOl 


OCA in MO QAQ 

OtU IU NU o4y 


NM. 


.017646 


OCA m MO *\f\QO 
OtU IU IMU 1 Oc?Z 


ALT OUT o 1 


oca in mo ocn 
otU IU Imu oDU 


NM. 


.017660 


OCA m MO 1 AO 

otu iu imu i oyo 


alt on you 


oca m MO **/^1 
otU IU IMU oO I 


NM. 


.017680 


oca m NO 1RQ.4 

otu iu imu i oyn- 


AL iD^U4y 


oca m MO *kRO 


NM. 


.017691 


oca m MO IfiQ^ 
OtU IU IMU 1 Ov?0 


MLODO / UO 


oca m MO 
OlU I L/ INU OOO 


NM_ 


.017698 


oca m NO 1RQR 

OlU IL/ INU 1 U57U 


ni^R4<* 

L/ I OD^+O 


OCA m MO 
OlU IU IMU OOO 


NM. 


.017702 


oca m NO 1RQ7 

OCU IL/ INU IU^i 


U l*fO# O 


oca in MO ^fi 
OlU ILy INVJ OOO 


NM. 


.017731 


oca m NO 1RQQ 

OCU IL/ INU 1 




oca m MO ^7 
OlU IU IMU OO/ 


NM_ 


.017732 


oca m NO 1700 

OCU JLJ INU 1 / UU 


no£07n 

U/lOU / U 


oca m MO '^ft 
OlU IU IMU OOO 


NM. 


.017733 


oca m NO 1701 

OCU IL/ INU I / U 1 




oca m MO **^Q 
OlU IU IMU OOo 


NM. 


.017734 


oca m NO 1709 

OCU IL/ INU 1 / \J/L. 


Uo tOO/ 


oca m mo 

OlU IU IMU ODU 


NM. 


.017746 


qca m NO 170^ 

OCU IL/ IMU I / UO 


UvjOO^ 1 


oca m MO **ft1 
OlU IU IMU OO I 


NM. 


.017750 


^ c O in NO 170A 

OCU IU INU 1 / Ut* 


UOOjOu 


oca m MO 1f\0 
OlU IU IMU OOZ 


NM_ 


.017761 


oca m NO 170^ 

OCU IU INU 1 / UO 




oca m MO ^fi** 

OlU 1 LJ IMU OOO 


NM. 


.017763 


oca m NO 170R 

OCU IL/ INU I l UU 


nA90A7 
L/4ZU4 / 


OCA |P| MA OCJ. 
OlU IU IMU OOH- 


NM. 


.017770 


oca m NO 1707 

OCU IL/ INU I f Ur 


U4oyOU 


oca m mo 

otu IU IMU ODD 


NM. 


.017779 


oca m MO 170ft 
OtU IU IMU I / UO 


ncn>i no 
UOU4UZ 


OCA in MO QfiC 

otU IU ImU oOO 


NM_ 


017780 


oca m MA i 70Q 

otu iu imu \ /uy 


UDUy 14 


OCA in MA OC7 

otu IU l\H-J oO/ 


NM_017782 


OCA in MA *17iA 
OtU IU INU i / IU 


UDOf 1 D 


oca m MO 
OtU IU IMU OOO 


NM. 


.017786 


oca m MA 1711 
OtU IU IMU I / I I 


D80001 


SEQ ID NO 369 


NM. 


.017791 


SEQ ID NO 1712 


D80010 


SEQ ID NO 370 


NM. 


.017805 


SEQ ID NO 1713 


D82345 


SEQ ID NO 371 


NM. 


.017816 


SEQ ID NO 1714 


D83781 


SEQ ID NO 372 


NM. 


017821 


SEQ ID NO 1715 
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SEQ ID NO 


GenBank 


SEQ ID NO 


Accession Numoer 




Accession Number 




D86964 


ocn m kin qto 
ocQ IU INU Of o 


NM 


017835 


qcn in NO 171R 

OlW \LJ INU If IU 


□86978 


o cn in kin iia 
OcQ IU NU OV4 


NM 


.017843 


qpn in Kin 1717 

OtU I U INU 1 f I f 


D86985 


ocn in kin Q7c 
obU IU INU ofO 


NM_ 


.017857 


qcn in MO 171R 

OlU IU INU I f IO 


H07A7C 

D87076 


OCn 1 kin Q7C 

obU IU NU of D 


NM 


017901 


qcn m MO 171Q 

OLW IU INU If 1 C7 


no^ >i co 

D87453 


OCA 1 A MA Q77 
OCQ IU INU Of f 


NM 


.017906 


qcn m NO 1790 
OlU I L/ INU I f ^U 


no^ vi en 

D87469 


OCA 1 Pi MA Q7Q 

obU IU INU Ofo 


NM_ 


.017918 


qcn in NO 1791 

OlU IU INU \ I \ 


D87682 


OCA 1 Pi K 07Q 

ocQ IU NU Of y 


NM_ 


.017961 


qcn m MO 1799 

OlU I Ls INU 1 f c-c- 


poc /I oo 

u2o4Uo 


oca m mo opn 
ocu IU INU OOU 


NM_ 


.017996 


qcn in kja 179^ 

OlU ILs INU 1 f £~<J 


J02639 


OCn IPl kin QQ-1 

ocQ IU NU ool 


NM 


018000 


CiFO in NO 1794 

OCU \\J INU 1 f 


J04162 


o cn ipi mh opo 
ocQ IU NU ooz 


NM 


018004 


qcn m NO 179^ 

OCU \\J INU 1 f ^.U 


K02403 


CCA IPl MA OQ/1 

obU IU INU oo4 


NM 


018011 


qpo in NO 179R 
OlU. IL' INU 1 f ^O 


L05096 


OCA 1 Pi MA QQC 

ocQ IU INU ooo 


NM 

1 Tl 1 VI 


018014 


opo m NO 1797 
OCU. ILs INU I f ^f 


L10333 


OCA IPl MA QQC 

obQ IU NU ooo 


NM 


018022 


qpo in no 179ft 

OCU. \\J INU I f ^O 


1 <4 A A C 

LI 1645 


OCA 1 P\ MA OQ7 

SEQ IU NU oof 


NM 


018031 


QPO IPl MO 1 79Q 
OlU IU INU I f Zc7 


L21934 


OCA in MA OQQ 

SEQ IU NU ooo 


NM 

1 illVI 


018043 


qpo in mo i7°.n 

OCU IU INU I f Ou 


L22005 


o r— /-\ in Kin ooo 

SEQ IU NU oo9 


NM 

1 H IVI 


018048 


QPO in MO 17°.1 
OlU IU INU I / O I 


L48692 


0 1 — /^v in KIA OO-I 

SEQ IU NU oyi 


NM 

1 ^1 IVI 


018062 


QPO in MO 17°*9 
OCU IU INU I f OdL 


M 12758 


OCA in KIA ooo 

SEQ IU NU o92 


NM 

1 ^1 IVI 


018069 


QPO in MO 17°.** 
OCU IU INU I f OO 


M15178 


oc/~\ in MA ooo 
SEQ IU NU o9o 


NM 

1 NIVI 


018072 


QPO in MO 1 7°iA 
OCU IU INU I f OH 


ft */•> «| CCA 

M21551 


OCA IPl KIA OQ/1 

ocQ IU NU oy4 


NM 

1 >l IVI 


018077 


qpo in mo 17^^ 

OlU IU INU I f OO 


M24895 


oca in kia one 
SEQ IU NU o9o 


NM 

1 M 1 VI 


018086 


QPO in MO 17°.fi 
OlU IU INU I t OO 


ft JIAAAOO 

IVI26383 


o in kia one 
SEQ IU NU oyb 


NM 

INIVI 


018087 


QPO in MO 17°.7 
OCU IU INU I f Of 


M27749 


oc/^ in KIA OOT 

SEQ IU NU o97 


NM 


018093 


QPO in MO 17^ft 
OlU IU INU I f OO 


M28170 


c* c s~\ ir\ ft. ono 

SEQ ID NO 398 


NM 

1 ^1 IVI 


018098 


OCA m MA *I*7QQ 

otu iu inu i f oy 


M29873 


a[*a in kia onn 

SEQ ID NO 399 


NM 


018099 


ocn in mo 17AO 

OlU IU INU I f *rU 


M29874 


oca in kia /inn 
SEQ ID NU 4UU 


NM 


018101 


qpo in mo 1741 

OlU IU INU 1 f H 1 


M30448 


OCA in KIA Af\4 

ocQ IU NU 4U1 


NM 


018103 


qpO in MO 1749 
OCU IU INU 1 i 7t 


IV130818 


OCA 1 PI KIA A 00 

obU IU NU 4U2 


NM 


018109 


qcn in MO 1744 

OCU \\-J INU 1 f *t*t 


M31932 


OCA IA KIA A nQ 

obU IU NU 4Uo 


NM 

1 VI 


018123 


qcn m MO 174fi 

OCU IU INU I f *+U 


ft /|A*7AOO 

M37033 


OCA IA KIA AC\A 

obQ IU NU 4U4 


NM 

1 >■ IVI 


018131 


qpn m MO 1747 

OCU IU INU 1 f i f 


ft 4 c c r\ A A 

M55914 


oc/"\ in kia zinc 
SEQ IU NO 400 


NM. 


_018136 


QPO in MO 174ft 
OlU IU INU I f *tO 


m a /-n-w i Aft 

M63438 


oca in K / AC 

SEQ ID NO 406 


NM. 


_018138 


QPO in MO 174Q 
OlU iu INU I f *fy 




OCA in MA ACY7 
OlU IU INU *fUf 


NM. 


^018166 


SFO in NO 1750 

OLW \U INv 1 f sJ\J 


M68874 


SEQ ID NO 408 


NM. 


_018171 


SEQ ID NO 1751 


M73547 


SEQ ID NO 409 


NM. 


_018178 


SEQ ID NO 1752 


M77142 


SEQ ID NO 410 


NM. 


_018181 


SEQ ID NO 1753 


M80899 


SEQ ID NO 411 


NM. 


018186 


SEQ ID NO 1754 
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nUUCoblUI 1 InUIIIL/wI 
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MQOOOO 
IVIOOO^Z. 


SEQ ID NO 412 

wL.Vx 1 U 1 N V> *t It 


NM_ 


018188 


SEQ ID NO 1756 ' 


IViwwww 1 


SEO ID NO 413 

OL.W IU INU "T IvJ 


NM_ 


018194 


SEQ ID NO 1757 ! 


MQ371ft 
IVI wv> f 1 O 


SFO ID NO 414 


NM_ 


.018204 


SEQ ID NO 1758 


MQfi^77 


SFO ID NO 415 

O 1 — VK 1 l—/ INV-/ *T 1 w 


NM_ 


.018208 


SEQ ID NO 1759 ! 


mm nnnn99 


SFO ID NO 417 

OuW IU INU *t 1 / 


NM_ 


.018212 


SEQ ID NO 1760 

V^L»V*C IU/ 1 ^* V./ 1 I WW 


IN IVI UUUvHH 


SFO ID NO 41 R 

OLU IL/ INU t IO 


NM_ 


.018234 


SEQ ID NO 1763 

VL.Vj( IU/ liV/ 1 1 WW 


mm nnnn^n 

IN IVI \J\J\J\J\j\J 


SFO ID NO 419 

OuW 1 U INW *T 1 W 


NM_ 


.018255 


SEQ ID NO 1764 

v/ u> vac I l/ 1 » v^ 1 1 v/ « 


mm nnnn^7 

IN IVI UUUUU I 


SFO ID NO 490 

OL-VjC IL/ !NU *t/£w 


NM_ 


.018257 


SEQ ID NO 1765 

V/UiiVX IU/ 1 M Vp/ 1 1 WW 


mm nnnnRn 

IN IVI UUUUOU 


SFO ID NO 491 

OL-W IL/ INU *-X£- 1 


NM_ 


.018265 


SEQ ID NO 1766 

Va/ Lb\K 1 l/ 1 « V^ 1 1 V/ V/ 


mm nnnnfiA 

IN IVI vJkJIJW-t 


SFO ID NO 499 


NM_ 


.018271 


SEQ ID NO 1767 I 

wUbNK iu/ iiv 1 ■ wi 


mm 000073 

IN IVI \J\J\J\J t O 


SFO ID NO 424 

UU\a( IL/ INU **T^'T 


NM_ 


.018290 


SEQ ID NO 1768 I 

WUVC lUf llw 1 * WW 


MM 000077 
IN IVI \J\J\J\J( 1 


SFO ID NO 49*S 

OLW IL/ INU *t^.v/ 


NM_ 


.018295 


SEQ ID NO 1769 

v/u.vx iu/ mv 1 r ww 


MM OOOOttfi 
IN IVI UUUUOD 


SFO ID MO 49R 

OlU IL/ INU *+^.U 


NM. 


.018304 


SEO ID NO 1770 

UU>V)( IU 1 1W 1 I 1 W 


MR/1 OOOOR7 
IN IVI UUUUO/ 


SFO ID NO 497 

OuU IL/ INU *t^# 


NM_ 


.018306 


SEO ID NO 1771 

V/U.VX IU INU 1 f f 1 | 


MM OOOOQ^ 

in IVI uuuuyo 


SFO ID NO 49Q 

OlW IL/ INU *T^C7 


NM_ 


.018326 


SEO ID NO 1772 

ULVx IU IM w 1 / / £m 


MM OOOOQfi 

in ivi uuuuyo 


SFO ID NO 430 

OCvx IL/ INU *+OU 


NM_ 


.018346 


SFO ID NO 1773 

O L_ V»l IU INw 1 f # W 


MM 000100 
IN IVI UUU I UU 


SFO ID NO 431 

OUU IL/ INU *tw 1 


NM_ 


.018366 


SEQ ID NO 1775 

OUW IU ItIU 1 I # w 


MM 000101 

IN IVI UUU 1 \J 1 


SFO ID NO 439 

OLW IU INU *rw^. 


NM. 


.018370 


SEQ ID NO 1776 

v/l»v*c il/ ii\/ 1 1 r w 


MM 0001 04 

IN IVI UUU IU4 


SFO ID NO 433 

OLVx IU INU *Tww 


NM_ 


.018373 


SEQ ID NO 1777 

V/L»V*C IL/ 1 « V^ Iff! 


MM 0001 0Q 

IN IVI UUU 1 ww 


SEO ID NO 434 

O L- Vx IL/ INU *Tw*T 


NM_ 


.018379 


SEQ ID NO 1778 

\»/l— VaC IL/ 1 1 W Iff W 


MM 00019^ 

IN IVI UUU \£~\J 


SFO ID NO 43^ 

OLVx IU INU *Tww 


NM_ 


.018384 


SEQ ID NO 1779 

V/U»Va( IL/ 1 Iff w 


MM 0001 97 

IN IVI UUU 1 c~ I 


SFO ID NO 436 

OL_ Vj< 1U INU *Tww 


NM_ 


.018389 


SEQ ID NO 1780 

V/LVK IL/ liV/ 1 f WW 


MM 00013^ 

IN IVI UUU IOU 


SFO ID NO 437 

OL-W IU INU tOI 


NM. 


.018410 


SEQ ID NO 1783 

\J IU/ llW 1 1 WW 


MM 000137 

INIVI UUU IO# 


SFO ID NO 438 


NM. 


.018439 


SEQ ID NO 1785 

UUv( ILf MV 1 f V/V/ 


NM 00014fi 
— 1 *tu 


SEO ID NO 439 

V/L-VaC IU 1 N V-/ "TWW 


NM. 


.018454 


SEQ ID NO 1786 

\^I_VJ( 1 L/ MV 1 f V/V/ 


MM 00014Q 

IN IVI UUU I*t57 


SFO ID NO 440 

OUVk IU INv^ 1 Iv 


NM. 


.018455 


SEQ ID NO 1787 

V/LVX IU/ I^IX^ 1 f V/ 1 


MM 0001*54 

Ml VI ^ UUU IUt 


SEO ID NO 441 

VL>Vx IU inu 11 1 


NM. 


.018465 


SEQ ID NO 1788 


MM 0001R1 

IN IVI UUU 1 U 1 


SEO ID NO 443 

uL. vac IU INU 1 1 w 


NM. 


.018471 


SEQ ID NO 1789 

V-/ 1— V*C IL/ 1 1 W 1 f WW 


MM 0001 

IN IVI UUU I ww 


SFO ID NO 444 

OLW IU INU *T*T*T 


NM. 


.018478 


SEO ID NO 1790 

U U. v>C IU INU 1 I ww 


MM 0001 ft A 
IN IVI UUU I DO 


SFO ID NO AAR 

OlU IL/ INU *t*tw 


NM_ 


.018479 


SEO ID NO 1791 

OLW IU IN W 1 / w 1 


MM 0001 fiQ 
IN IVI UUU I ww 


SFO ID NO 44fi 

OCW IU INU *rtU 


NM. 


.018529 


SEO ID NO 1793 

Ol-Vjc IU 1 >l v-/ 1 1 ww 


MM 0001 

IN IVI UUU I f O 


SFO ID MO 447 

O L_W IL/ INU H f \ 1 


NM. 


.018556 


SFO ID NO 17Q4 

OU.W IU INU 1 1 w*T 


MM 0001Q1 

IM1VJ UUU 1 w 1 


SEO ID NO 448 


NM. 


.018569 


SEQ ID NO 1795 

V^LmVJjC IL/ 1 H V^ 1 f W V/ 


NM_000201 


SEQ ID NO 450 


NM. 


.018584 


SEQ ID NO 1796 


NM_00021 1 


SEQ ID NO 451 


NM. 


.018653 


SEQ ID NO 1797 


NM_000213 


SEQ ID NO 452 


NM. 


.018660 


SEQ ID NO 1798 


NM 000224 


SEQ ID NO 453 


NM 


018683 


SEQ ID NO 1799 
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SEQ ID NO 


GenBank 


SEQ ID NO 
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accession iNurnuer 




NIVI 


uuuzoy 


OCA ir\ MA A PLA 

obU ID InU 4o4 


NIVI u 1 OOOw 


ecn in MO 1800 


MM 


UUU^CU 1 


oca i r\ ma A nc 
obQ ID NU 4oo 


NM U 1 0000 


<5FO in MO 1801 


NIVI 


oon9fi8 


OCA 1 r\ MO A KkCZ 


MM niftfiQ^ 

nivi u i ooyo 


<^FO lO MO 1802 


NIVI 


f U 


OCA l a MO /t^R 
ocvj IU InvJ 40O 


N IVI U I O 1 £Sj 


<5FO ID MO 1803 


MM 
NIVI 


nnn97i 


ceo in MO AE\Q 
otU IU INC/ 40» 


MM fMftftAO 
NIVI U 1 OOHU 


^FO ID NO 1804 


MM 
NIVI 


uuuzoo 


CCA IO MO Aftfl 
OtU IU InU 40U 


MM fMRftA9 

N IVI U I 004^. 


SFO ID NO 1805 I 


MM 
NIVI 


nnn9AA 


OCA IA MO ACZA 
ocvj IU 401 


mm niacin 
nivi u i oyou 


^FO ID NO 1806 

OuU IL/ IMV-/ 1 vuu 


MM 
NIVI 


nnn9RR 
uuuzoo 


OCA 1 A MA A CO 

obU IU NU 4oZ 


\iivi_u i oyoo 


qcn in NO 1807 


MM 
NIVI 


uuuzy i 


OCA 1 A MA /I CJQ 

otU IU NU 40O 


n*iQnnn 

NIVI U loUUU 


ccn ID NO 1808 


MM 
NIVI 


nnn9QQ 


OCA in MO ACXA 

otU IU NU 404 


MM H1Q01^ 

nivi u i yu i o 


qcn ID NO 1809 


MM 


uuvouv 


OCO 1 A MO ACtPi 

otU IU NU 4uO 


nivj u i yu^o 


cpn in NO 1810 


MM 
NIVI 


uuuo i u 


OCO IA MO A CtCt 

ohU IU NU 400 


MM fMOfl97 

inivi_u i yuz/ 


qcn ID NO 181 1 


MM 
NIVI 


UUUO 1 1 


OCO IA MO ACV7 

ocQ IU NU 4o/ 


MM fHOHAi 
INM U1 cJU4 1 


qcn m NO 1819 

ObU IL/ IViw IO \C 


MM 

INI VI 


UUUoT / 


OCO IA MO yICQ 

ScQ IU NU 4bo 


mm n^QHAA 
IM M_JJ 1 y U44 


oca in mo 181*3 

OlW IL/ IM w IO IO 


NM 


UUUoZU 


OCA IA MA A CO 

ocQ ID NU 4fay 


MM rHQORI 


' QPn in mo ifti^i 

OuU IL/ INU IO IO 


NIVI 


aaaq/IO 
UUUo4^ 


OCA IA K 1 /*^\ A~7f\ 

SEQ ID NO 470 


MM fHOOQA 


oca m MO 1ft1fi 

OtU I L/ IMVJ IO IO 


KIR A 
NM 


aaaq/1 £5 
UUUo4D 


Or^/""\ IA K 1 /~\ A~7 4 

ScQ ID NO 471 


NM_UiyOD4 


OCA 1 A KIA iftiT 
OlW ILJ IMU IO I f 


MM 

INI VI 


uUUooz 


OCA IA K 1 f~\ A ~7*~) 

SEQ ID NO 472 


mki a^oo/ik 
NM_U1 yo4D 


oca m MO 1R1fl 

OCU I Li INU IO IO 


NM 


UUUoDO 


oc^\ ia k i/"\ >nro 

SEQ ID NO 473 


MM HHOQ^fi 

nm__u nyooo 


oca m MO 1A1Q 

OCU IL/ INU IO 127 


MM 


JJvJUwDo 


OCA IA KIA A~7 A 

ScQ ID NO 4/4 


k i k » non*i^n 


CiFO in MO 1890 

OlW IL/ INU lO^U 


MM 

NIVI 


uuuooy 


OCA IA MO y17K 

obQ ID NU 4/0 


mm noni^*^ 


^FO in NO 1891 

OCU IL/ INU 1 1 


MM 

INM 


UUUoDO 


OCO IA MO A ~7CZ 

ScQ ID NU 4/b 


mm aoai/1^ 


oca m NO 1899 

OlU IU INU 1 


MM 

INM 


UUUODO 


OC/"^ IA K A 77 

SEQ ID NQ 477 


IMM_UZUlOU 


CiFO in NO 189*3 

OCU IL/ INU lO^O 


MM 
INM 


UUUOO 1 


OC/"\ IA KIA A 70 

SEQ ID NO 478 


KIKil nOAHftQ 


oca m MO 1894 

OCU I L/ INU IOt*T 


MM 
INM 


uuuoy # 


OCA IA KIO AQCi 

ScQ ID NO 4oU 


NM__UZU1 DO 


oca m NO 189*S 

OCU IL/ INU lO^J 


MM 
IN IVI 




OCO IA MO AO.*. 

obQ ID NO 4ol 


mm noniftQ 


oca m NO 1896 

OCU IU INU IOZ.U 


MM 
IN IVI 


UUU*T I *T 


OCO IA MO A QO 

obU IU NU 40ii 


K\tiA non itq 
in ivi uzu i # y 


<^FO in NO 1897 


MM 
IN lvl_ 


UUU^t I u 


ceo in MO Aft^ 
OtU IU INU 400 


MM 

INIVI IO*t 


SFO ID NO 1828 


MM 
INM 


wUU*h&^ 


OCO I n MO A &A 
otU IU NU 4o4 


MM 09H1AR 
INIVI I OO 


SFO ID NO 1829 

OtW IL/ INN—' \\j£—\J 


MM 

INM 


UUUn-Z^f 


oco in kio a qp± 
otU IU NU 4o0 


INM UZU IOO 


<^FO ID NO 1830 

OCU IL/ INU 1 UOV 


NM 

IN IVI 


000433 


OCA IA M/*\ yl OC 

SEQ ID NO 486 


NM_UZUloy 


oca m MO 18*^1 

OCU IL/ INU IOO 1 


NM 


000436 


OCA IA KIO A 07 

SEQ ID NO 48/ 


INM^UZU iy/ 


oca m MO 18*39 

OCU IU INU IOO^ 


NM 


000450 


CCA in MO ARR 
OCU IU INU HOO 


mm 090199 


SEQ ID NO 1833 


NM 


000462 


SEQ ID NO 489 


NM_020215 


SEQ ID NO 1834 


NM. 


_000495 


SEQ ID NO 490 


NM_020347 


SEQ ID NO 1836 


NM 


000507 


SEQ ID NO 491 


NM_020365 


SEQ ID NO 1837 


NM 


000526 


SEQ ID NO 492 


NM 020386 


SEQ ID NO 1838 
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SEQ ID NO 


GenBank 
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SEQ ID NO 


NM_ 


000557 


SEQ ID NO 493 

V/ L— VK IL/ MV *Twv/ 


NM 020445 

1 ll VI \J fmt\J ■ ■ V/ 


SEQ ID NO 1839 


NM_ 


000560 


SEQ ID NO 494 

V/L_V*C IU 1 ~ W~ 


NM 020639 


SEQ ID NO 1840 


NM_ 


.000576 


SFO ID NO 4Q5 

OL.W 1 \-J 1 N V/ *TW v/ 


NM 020659 


SEQ ID NO 1841 

V/ ImVk i l/ ■ x v/ i w r i 


NM_ 


.000579 


SEO ID NO 496 

V/ L_ VK 1 VJ 1 i\/ *Tww 


NM 020675 

INIVI \JKm\J\J f W 


SEQ ID NO 1842 


NM_ 


000584 


SFO ID NO 4Q7 

OUW 1 L_/ 1 N V_/ *TC/ / 


NM 020686 

INIVI \J£-\J\J\J\J 


SEQ ID NO 1843 I 

V/ b> VX 1 L/ 1 'i V/ 1 W ■ V/ 


NM_ 


000591 


SFO ID NO 498 

Ol-W 1 \-J INv/ *Tv/LJ 


NM 020974 

INIVI WfcV/w I ~ 


SEQ ID NO 1844 

l—/ 1— • \JK 1 L/ 1 ™ V/ I w I r 


NM_ 


000592 


SEO ID NO 499 

OL.W IL/ 1 NV/ *TC7 W 


NM 020978 

INIVI Ufcwtf f V/ 


SEQ ID NO 1845 

v/ i*>x I fc/ • 'i v/ I w rw 


NM_ 


.000593 


SFO ID NO 500 


NM 020979 


SEQ ID NO 1846 

V/L_>VX IL/ 1 ^ V/ IV/ TV/ 


NM_ 


.000594 


SEO ID NO 501 


NM 020980 

1 llvl wfavw V/W 


SEQ ID NO 1847 

V/ 1— 1 mar ■ ~ 1 X<* • f 


NM_ 


.000597 


SEO ID NO 502 


NM 021000 

INIVI w£. 1 UwW 


SEQ ID NO 1849 

V/ ^_ v»c I L/ i x v/ i w r w 


NM_ 


.000600 


QFO ID NO 504 

OL.W IL/ INV./ v/v/*T 


NM 021004 

INIVI \J£m 1 \J\J~ 


SEQ ID NO 1850 

w^VK 1 L/ 1 M \— / 1 V/V/v/ 


NM_ 


.000607 


cpn ID NO 50*S 


NM 021025 

INIVI \J L- 1 U^O 


SEQ ID NO 1851 

UUVk IL/ llV/ 1 WW 1 


NM_ 


_000612 


ccn in NO 50R 

OCW IL/ INv/ \Jv/ w 


NM 021063 

INIVI v/*. 1 V/V/v/ 


SEO ID NO 1852 

L/L.VK IL/ 1 1W 1 WV/C 


NM_ 


_000627 


Qpn in no 507 


NM 021065 

INIVI V/^. lUUw 


SEO ID NO 1853 

V/L.VK IL/ INV-/ lUww 


NM_ 


.000633 


cpo in NO 508 

OC\t( IL/ INw v/UO 


NM 021077 

INIVI V/4L. IV// f 


SEO ID NO 1854 

V/L-Vat It-/ 1 iV/ IUw~ 


NM_ 


000636 


cpn in no 50Q 


NM 021095 

INIVI wZ_ lutfw 


SEO ID NO 1855 

V/L_Vjh( IL/ INV/ 1 WWW 


NM_ 


_000639 


Qcn m NO 510 

OL.VX IL/ 1 N V_y v/ IV/ 


NM 021101 

INIVI V/^L 1 1 v 1 


SEO ID NO 1856 

wL-Vl( Ik/ llw 1 V/V/V/ 


NM_ 


000647 


QCO ID NO 51 1 

OlW IL/ INVJ Oil 


NM 021103 

INIVI V/^i 1 1 V/w 


SEO ID NO 1857 

V/ L_VX 1 L/ | IV 1 V/W f 


NM_ 


_000655 


SFO ID NO 519 

or_w IL/ INVJ O I 41 


NM 021128 

INIVI V/^ 1 l*.U 


SEQ ID NO 1858 

V/ L.VX 1 U/ 1 N V/ 1 WWW 


NM_ 


_000662 


SFO ID NO 51^ 

OLW IL/ IMv v/ IO 


NM 021147 

INIVI W&. 1 I*TI 


SEQ ID NO 1859 

WL— IL/ 1 lV 1 W Wn/ 


NM_ 


_000663 


SFO ID NO 514 

OL.VX IL/ INU v/ I *T 


NM 021151 

INIVI \J £- 1 1 w 1 


SEQ ID NO 1860 

V/L_V*C IL/ llV/ 1 WWW 


NM_ 


_000666 


SEO ID NO 515 

ULVx IL/ INv/ w U 


NM 021181 

INIVI W^ 1 1 w 1 


SEQ ID NO 1861 

V/L_VaC IL/ 1 » V/ 1 WW I 


NM. 


_000676 


SFO ID NO 51R 

OL\j( IL/ INV-/ vJ IV/ 


NM 021190 

INIVI Wfc. 1 1 <J W 


SEQ ID NO 1862 

UL«\K IL/ 1 iV 1 WW*— 


NM 


_000685 


SEO ID NO 517 

ULVX IL/ 1 N V/ J 1 f 


NM 021198 

1 N IVI \J£m 1 1 WW 


SEQ ID NO 1863 

V/ Lp^( I 1 ^ 1 WW 


NM_ 


_000693 


SEO ID NO 518 

wuW IL/ INw v# IU 


NM 021200 

INIVI w£. Ifcvw 


SEQ ID NO 1864 

WL»\K IL/ 1 iV 1 W W »^ 


NM. 


_000699 


SEO ID NO 519 

UuVn IL/ IxV/ v/ 1 w 


NM 021203 

INIVI W^ 1 V/ W 


SEQ ID NO 1865 

V/ Lai\JC ■ L/ 1 « V/ 1 V/ WW 


NM. 


_000700 


SEO ID NO 520 

WUVx IL/ INv \j£m\J 


NM 021238 

1 NIVI ^ \J 


SEQ ID NO 1866 

V/ L_>JC IL/ 1 IV/ 1 V/WW 


NM. 


_000712 


SEO ID NO 521 

UL.\« II-/ INN-/ WA» 1 


NM 021242 

1 NIVI \J 1 £>~4i 


SEQ ID NO 1867 

V/ L«VK 1 L/ ■ ~ V/ ■ V* f 


NM. 


_000727 


SEO ID NO 522 

O I— Vol IL/ INx_/ v/r. * 


S40706 


SEQ ID NO 1869 

VI— >X IV 1 XV/ 1 www 


NM. 


_000732 


SFO ID NO 52^ 

v/ L. v»c IL/ INV-/ u^u 


S53354 


SEQ ID NO 1870 

V/ L_ V*C IL/ INV/ IUI V/ 


NM. 


_000734 


SFO ID NO 524 

OuU IL/ INv/ \j£J-r 


S59184 

v/ww 1 V-»T 


SEQ ID NO 1871 < 

V/L_Vac IL/ 1 »V 1 vl 1 


NM. 


,000767 


SFO ID NO 525 


S62138 


SEQ ID NO 872 

V/ L— \X IL/ INV/ ■ */ f 


NM. 


,000784 


SEO ID NO 526 

V/L.VX 1 1— / 1 IV/ \JC-\J 


U09848 


SEQ ID NO 1873 

V/ fc— r «*C 1 b/ I V ^/ 1 >/ • V«/ 


NM. 


_000802 


SEQ ID NO 528 


U 10991 


SEQ ID NO 1874 


NM. 


_000824 


SEQ ID NO 529 


U 17077 


SEQ ID NO 1875 


NM. 


_000849 


SEQ ID NO 530 


U18919 


SEQ ID NO 1876 


NM 


000852 


SEQ ID NO 531 


U41387 


SEQ ID NO 1877 



-30- 



BNSDOCID: <WO 02103320A2 J_> 



r 

WO 02/103320 PCT/US02/18947 



15 



25 



35 



GenBank 

Accession Number 


SEQ ID NO 


GenBank 

Accession Number 


SEQ ID NO 


NM 


000874 


SEQ ID NO 532 

OL— vx 1 IS IMV WW*-. 


U45975 


SEQ ID NO 1878 


NM 


000878 


SEQ ID NO 533 

\J V»( 1 Lx IXV/ WWW 


U49835 

i ^S7 \SJ ^S7 


SEQ ID NO 1879 


NM 


000884 


SEQ ID NO 534 ! 


U56725 


SEQ ID NO 1880 


NM 


000908 


SEQ ID NO 537 

XX 1 V I m x^ xxxx » 


U58033 


SEQ ID NO 1881 


NM_ 


000909 

\f X* \J X^ \f X^ 


SEQ ID NO 538 

XX ^VX 11^ 1 'I X^ S/wV 


U61167 


SEQ ID NO 1882 


NM 


000926 


SEQ ID NO 539 

vx^Vac IL/ 1 M v_x WWW 


U66042 

XX' XX VX VX IM 


SEQ ID NO 1883 


NM 


000930 

V/ XX XX vx XX X/ 


SEQ ID NO 540 1 


U68385 

\m/ vxx* vx vx xx 


SEQ ID NO 1885 


NM 


000931 

WWWW^X 1 


SEQ ID NO 541 


U68494 

*Sf \JS ^JJ I ss* ■ 


SEQ ID NO 1886 


NM 


000947 

VX w VX VX I ■ 


SEQ ID NO 542 

OE^W 1 *S 1 M \S vrtt 


U74612 


SEQ ID NO 1887 


NM 


000949 

V V W VX^T VX 


SEQ ID NO 543 


U75968 

vx f www^^ 


SEQ ID NO 1888 


NM 


000950 

XX Vy VX VX XX XX 


SEQ ID NO 544 


U79293 


SEQ ID NO 1889 


NM 


000954 
www \j~ 


SEQ ID NO 545 


U80736 

\^WWf Wx^ 


SEQ ID NO 1890 


NM 

ill VI 


000964 


SEQ ID NO 546 1 


U82987 


SEQ ID NO 1891 


NM 

>l 1 VI 


001003 

\J vx 1 WWW 


SFO ID NO S4Q 


U83115 

VVWW 1 1 w 


SEQ ID NO 1892 


NM 


001016 

Uw 1 W 1 \J 


qcn in no RRi 

OCZW 1 IS INVJ JO 1 


□89715 

WWW f 1 w 


SEQ ID NO 1893 


NM 

>IIVI 


001047 

XX VX 1 W"T I 


qpn in no vn 


U90916 

WWWW 1 w 


SEQ ID NO 1894 


NM 


001066 
i www 


cpn in no sfis 

OCW 1 \s IW OJJ 


U92544 


SEQ ID NO 1895 


NM 

it 1 VI 


001071 

wv i w i i 


ccn in mo *5*56 

Ol-W il^ inu oju 


U96131 

V/WW 1 W 1 


SEQ ID NO 1896 


NM 


001078 

WW 1 >X I xx 


SEQ ID NO 557 


U96394 

W W WW w~ 


SEQ ID NO 1897 


NM 


001085 


SEO ID NO 558 

QL\K 1 IS IHKS xxww 


W61000 RC 

V V \J 1 W W W 1 >w 


SEQ ID NO 1898 


NM 


001089 

VX VX 1 XX vx 


SEO ID NO 559 

UL»\X 1 La/ 1 1 V/ WWW 


X00437 


SEQ ID NO 1899 


NM 


001109 

VX V 1 I VX VX 


SEO ID NO 560 

OL V*t IL^ INw www 


X00497 


SEQ ID NO 1900 


NM 


001122 


SEO ID NO 561 

WL.VX 1 Lx* IN W XXW 1 


X01394 


SEQ ID NO 1901 


NM 


001124 


SEQ ID NO 562 

OLU 1 IS IMV-/ W vxZ_ 


X03084 

x xW W W V^i 


SEQ ID NO 1902 


NM 


001161 


SEQ ID NO 563 

VJL-.VaC 1 Lx Imx_x WWW 


X07834 

^X\^ f %^w » 


SEQ ID NO 1905 


NM 


001165 

x^ ■ ■ x^x^ 


SEQ ID NO 564 

xJi— v»< il^ i ww r 


X14356 


SEQ ID NO 1906 j 


NM 


001166 

X^ x^ f 1 


SEQ ID NO 565 

XX L_ V*C 1 L«* liV V/vV 


X16302 


SEQ ID NO 1907 


NM 


001168 

XXXX 1 1 xxxx 


SEQ ID NO 566 

Ol— VX 1 L-/ I^IV/ www 


X52486 


SEQ ID NO 1909 


NM 


001179 


SEQ ID NO 567 

L_ V*C IL^ MW WW f 


X52882 


SEQ ID NO 1910 


NM 

1 Y IV 1 


001185 

WW 1 1 xxXX 


SEQ ID NO 569 

OUW 1 IS IMV-/ WWW 


X56807 

/\V WWW f 


SEQ ID NO 1911 


NM 


001203 


SEQ ID NO 570 

wLVx ItS IMV Wf W 


X57809 

X W t XX XX XX 


SEQ ID NO 1912 


NM 


001207 


SEO ID NO 573 

Ot_VX 1 IS IMV VI w 


X57819 

/\w I Vx I w 


SEQ ID NO 1913 


NM 


001216 


SEQ ID NO 574 


X58529 


SEQ ID NO 1914 


NM 


001218 


SEQ ID NO 575 


X59405 


SEQ ID NO 1915 


NM 


001223 


SEQ ID NO 576 


X72475 


SEQ ID NO 1918 


NM 


001225 


SEQ ID NO 577 


X73617 


SEQ ID NO 1919 


NM 


001233 


SEQ ID NO 578 


X74794 


SEQ ID NO 1920 
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SEQ ID NO 
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SEQ ID NO 
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Accession Number 




NM_ 


.001236 


SEQ ID NO 579 i 

OL.W II-/ INv/ wf w 


X75315 

Al WW 1 W 


SEQ ID NO 1921 


NM_ 


.001237 


SFO ID NO 580 

v_> L_Vx 11/ INV www 


X79782 

/\ I W 1 W £— 


SEQ ID NO 1922 

V/L— VK 1 1/ 1 lV 1 ■ ■— 


NM_ 


.001251 


SFQ ID NO 581 

OL.W 1 1— ' IMV/ WW 1 


X82693 


SEQ ID NO 1923 

wUiVK 11/ I1N/ 1 WfcW 


NM_ 


.001255 


SEQ ID NO 582 


X83301 

/XWWWW 1 


SEQ ID NO 1924 

v/ i— \x i l/ i n ^/ i vs <— r 


NM_ 


.001262 


SEO ID NO 583 


X93006 

/xww w w w 


SEQ ID NO 1926 


NM_ 


.001263 


SEO ID NO 584 

v-/ L_ 1 \-J INS/ \J\J*T 


X94232 


SEQ ID NO 1927 

Wt-VX 11/ 1 IV/ 1 v/a- f 


NM_ 


.001267 


SEO ID NO 585 

W^Vn 11/ Www 


X98834 

/X W W V/ W~ 


SEQ ID NO 1929 

wL— VX 11/ 1 V/ » w*-W/ 


NM_ 


.001276 


SEO ID NO 587 

v_/ i — vx i l/ i n v_/ ww i 


X99142 

AWW 1 T<C 


•SEQ ID NO 1930 

wL-VK IL/ 1 lV/ 1 %/V/W 


NM_ 


.001280 


SFO ID NO 588 


Y14737 

1 1 "T f w f 


SEQ ID NO 1932 

v_J L— vk IL/ 1 «V 1 wWfa 


NM_ 


.001282 


SEO ID NO 589 

ULVj( IL/ I M V/ vJUC 


71 1 887 

1 1 ww < 


SEQ ID NO 1933 

V/k— VK 11/ 1 M V/ 1 WWW 


NM_ 


.001295 


SEO ID NO 590 

OUVx 1 L/ I M V-/ WWW 


Z48633 


SEQ ID NO 1935 

WLhVK 1 ^/ |1|V/ 1 v/ V/ V/ 


NM_ 


.001305 


SEQ ID NO 591 

V/ I— V_j< 1 L/ 1 >i W WW 1 


NM 004222 


SEQ ID NO 1936 

WIbVK 1 i-/ 1 lV 1 v/v/V/ 


NM_ 


.001310 


SEO ID NO 592 


NM 016405 

1 llVI W 1 W~W W 


SEQ ID NO 1937 

wL— VX 11/ 1 lV 1 v/v/ 1 


NM_ 


.001312 


SEO ID NO 5Q3 

uLVX 1 1—/ | \V/ WWW 


NM 017690 

ItIIVI V/ 1 I WWW 


SEQ ID NO 1938 

V/L— Vj( 11/ MV/ 1 v/N/V/ 


NM_ 


.001321 


SFO ID NO 5Q4 

OLU 11-/ INv/ WW*T 


Cn ntia2Q RO 


SEQ ID NO 1939 

UL.VX 11/ J YW 1 WWW 


NM_ 


.001327 


SFO ID NO *^Q^ 

OuW 11/ INv/ www 


nnntin237 RO 

wUi I uy w i rxw 


SEQ ID NO 1940 

v/ 1 Vx 11/ INw 1 w"T w 


NM_ 


.001329 


cpn ID NO ^Qfi 


nontin2R3 RO 

wwl ltly|^.y/w rxv/ 


SFO ID NO 1941 

v_> L_vx 11/ INw 1 wt 1 


NM_ 


.001333 


cpn ID NO *^Q7 


vUl lllvj^w^ Fxy/ 


SFO ID NO 1Q49 

OLU It-/ 1 \V/ 1 w*T^ 


NM_ 


.001338 


SFO ID NO 5Q8 

«JL_W 11/ INV/ www 


Contia382 RC 

v/ui iviijww*- r\v/ 


SEQ ID NO 1944 

v/ 1— vx 11/ INw 1 w" I I 


NM_ 


.001360 


SEO ID NO 599 

0 1_ V-*< IL/ INV WWW 


Contia399 RC 

V/WI llluWOv 1 \V/ 


SEQ ID NO 1945 

WL— VX 11/ 1 » W 1 W^TW 


NM_ 


.001363 


SEO ID NO 600 

OL-Vk 1 1-/ INV/ V/Ww 


Contia448 RC 

V/WI 1 11m i i W 1 


SEQ ID NO 1946 

V/LjbVX IL/ 1 ^IV/ I W~W 


NM_ 


.001381 


SEO ID NO 601 


Contia569 RC 

VWl IUUwVw • xv/ 


SEQ ID NO 1947 

V/ L— VJ< 11/ 1 1W 1 w T I 


NM_ 


.001394 


SEO ID NO 602 

OLVj( 1 1-/ INV/ UU£ 


Contia580 RC 

wvl lUUvwv RV 


SEQ ID NO 1948 

V/L.VX 1 1/ 1 iv/ 1 W~W 


NM_ 


.001395 


SFO ID NO 603 

ULVx 1 1/ IN V-/ WWW 


Contia678 RC 

wwi i uy w # v/ rvv; 


SEQ ID NO 1949 

KJ L.VX 1 1/ MV/ 1 W~W 


NM. 


.001419 


SFO ID NO 604 

v/L_ Vac II-/ INv/ ww*T 


Cnntia7nR RC 

Wwl IUU I WW 1 XV/ 


SEQ ID NO 1950 

V/l— V«C IL/ llW 1 WWW 


NM. 


.001424 


SEO ID NO 605 

vJ 1 — VX 11/ 1 >i V/ www 


Contia718 RC 


SEQ ID NO 1951 

VvL»VX IL/ MV 1 WW ■ 


NM_ 


.001428 


SEQ ID NO 606 

wLbV( 1 ' 1 MV V/ W W 


Contia719 RC 

v/ v/ 1 IUU • • W I W 


SEQ ID NO 1952 

V/^bV>C 1 L/ 1 »W 1 WV/i- 


NM_ 


.001436 


SEQ ID NO 607 


Contia742 RC 

WWI ■ VIVJ f Tfa | \V/ 


SEQ ID NO 1953 

V/ L>VK 1 L/ 1 ~ V/ 1 W V/V/ 


NM_ 


.001444 


SEQ ID NO 608 

wUa\K 1 1/ 1 lV/ WWW 


Contia753 RC 

v/wi • ww i xv/ 


SEQ ID NO 1954 

v/i_ vac 1 •-/ I it v/ i ww ■ 


NM. 


.001446 


SEQ ID NO 609 

WL— V*( II-/ INN-/ WWW 


Contia758 RC 

wwl iny • ww i ».v/ 


SEQ ID NO 1956 

V/I—Vk IL/ 1 l W 1 WWW 


NM_ 


.001453 


SEO ID NO 611 

UUVX It-/ INN-/ V/ 1 1 


Contla760 RC 

wui i i.i y # V/V/ 1 XV/ 


SEQ ID NO 1957 

V/L-iVaC 1 L/ ■ 11 V/ 1 W W f 


NM. 


.001456 


SFO ID NO 612 

OLW ILJ IMv v/ 1 


Onntia842 RC 

wwi I uy wt^. i \ v/ 


SEQ ID NO 1958 

v/L.Vx 11/ 1 M v/ IwwU 


NM. 


.001457 


SEO ID NO 613 

WL_V»< ■!-/ 1 iw W 1 W 


Contla848 RC 

WWI IUVjV» i xw 


SEQ ID NO 1959 

wL-Vx 11/ MW 1 WWW 


NM. 


.001463 


SEQ ID NO 614 


Contig924_RC 


SEQ ID NO 1960 


NM. 


.001465 


SEQ ID NO 615 


Contig974_RC 


SEQ ID NO 1961 


NM. 


.001481 


SEQ ID NO 616 


Contig1018_RC 


SEQ ID NO 1962 


NM 


001493 


SEQ ID NO 617 


Contig1056_RC 


SEQ ID NO 1963 
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GenBank 

Accession Number 


SEQ ID NO 


GenBank 

Anc^^sion Number 


SEQ ID NO 


NM 


001494 


SEQ ID NO 618 

\J 1 \jt II—/ 11V W IW 


Contia1061 RC 


SEQ ID NO 1964 


NM 


001500 


SEO ID NO 619 

W I— Vac 1 L/ 1 M >-/ W IW 


Contia1129 RC 


SEQ ID NO 1965 | 


NM 


001504 


SEO ID NO 620 


Contia1148 


SEQ ID NO 1966 


NM 


001511 


SEO ID NO 621 


Contia1239 RC 


SEQ ID NO 1967 


NM 


001513 


CFO in NO 622 


Contia1277 


SEQ ID NO 1968 


NM 


001527 


ccn in mo fi?3 i 


Contia 1333 RC 

wvl ILIM 1 WWW | \ v/ 


SEQ ID NO 1969 


NM 

1 MIVI 


001529 

WW 1 W t— W 


SPO ID NO 624 

OUW IL/ Imw w^*t 


Contia 1386 RC 

Vwl 111^ 1 www t w 


SEQ ID NO 1970 

Mj^ ■ war ■ Nl ■ * 


NM 


001530 


SFO ID NO 625 


Contia 1389 RC 

IUU 1 WWW 1 \\/ 


SEQ ID NO 1971 


NM 


001540 

WW 1 \m0 I \J 


qcn ID MO 696 


Contia 141 8 RC 


SEQ ID NO 1972 


NM 

1 XIVI 


001550 

\J W 1 W W W 


SFO ID NO 627 

OL\j( IL/ IMw \JC- 1 


Contia 1462 RC 

Wvl ILIW 1 ~W^» 1 W 


SEQ ID NO 1973 


NM 

N IVI 


001551 

WW 1 WW 1 


cpn ID NO 628 


Contia 1505 RC 

\^\*f 1 lilU I WWW^_I w 


SEQ ID NO 1974 


NM 

1 NIVI 


001552 

WW 1 wvt 


cpn m NO R2Q « 


Contia 1540 RC 

VVl ILIW 1 W^W 1 w 


SEQ ID NO 1975 


NM 


001554 

ww 1 Jw" 


ccn m MO 6^1 

OLW IL/ INVJ UJ 1 


Contia 1584 RC 

VUI 1 LI W 1 W W~ 1 \V/ 


SEQ ID NO 1976 


NM 

NIVI 


001558 

w w 1 v/v/U 


O CU IL/ INu DOl 


Oontia1fi32 RC 

wwl I Liy i v/wfa r\v 


SEQ ID NO 1977 


NM 


001560 


OtZW IU Imw OOO 


Pnntinl fift? RC 
wui i uy i ww*. rxw 


SEQ ID NO 1978 


NM 

>l IVI 


001565 

WW 1 WWW 


cpn m mo 6^4 

OCU IU Imw ww*t 


Contia 1778 RC 

VUI 1 liy Iff W 1 \W 


SEQ ID NO 1979 


NM 


001569 

WW 1 www 


qpn m MO fi*^ 

ulU IL/ Imw UJJ 


nontin1829 

wwl 1 Uy 1 w^-w 


SEQ ID NO 1981 I 


NM 


001605 

WW 1 www 


cpn m MO 6^6 

O IL/ Imw www 


Contia 1 838 RC 

Wwl IUM 1 WWW 1 W 


SEQ ID NO 1982 


NM 

1 N 1 V 1 


001609 

WW 1 www 


cpn m MO 637 

OL-Vx IL/ INU ww # 


Contia 1 938 RC 

Vv/I IUU 1 WWW 1 xw 


SEQ ID NO 1983 


NM 

N 1 VI 


001615 

WW 1 W 1 w 


cpn m MO 638 

OuW ILs Imw wwO 


Contia 1970 RC 

VUI 1 11 W 1 \J t W 1 w 


SEQ ID NO 1984 


NM 

IMIVI 


001623 

WW ■ WfcW 


cpn in mo 63Q 

OuW IL/ Imw wOw 


Contia 1998 RC 

VUI IUU 1 WWW 1 xw 


SEQ ID NO 1985 


NM 

IMIVI 


001627 

WW 1 W 1 


qpn m MO R40 

OCW IL/ IN w w*tw 


Contia2099 RC 


SEQ ID NO 1986 


NM 

1 M IVI 


001628 

WW 1 W£-W 


qFO m MO 641 

O L— W IU Imw Ut 1 


Contia2143 RC 

VUI IUU&. 1 f W 1 xw 


SEQ ID NO 1987 


NM 

1 MIVI 


001630 

WW 1 VUv 


qpn in MO R42 

OCW IU Imw w*+Z. 


Contia2237 RC 

WWI lU^bVi t XW 


SEQ ID NO 1988 


NM 

IMIVI 


001634 

WW 1 \J^J^T 


qpn in MO 643 

O IL/ Imw CrO 


Contia2429 RC 


SEQ ID NO 1990 ! 


NM 

IMIVI 


001656 

WW 1 www 


qFn in MO 644 

OL.wJ IL/ Imw U'l'l 


Contia2504 RC 

Wwl IllUfaVv^ 1 W 


SEQ ID NO 1991 


NM 

1 MIVI 


001673 

WW 1 \J I X/ 


qFn in MO 64 

OuVtc IL/ Imw UtJ 


Contia2512 RC 

vwl UlULv 1 < | 1 \W 


SEQ ID NO 1992 


NM 

I M IVI 


001675 

WW 1 W V W 


qFO in MO 647 

OUVJ IL/ Imw w*t / 


Contia2575 RC 

\_SWI 1 LIWA— W f W 1 w 


SEQ ID NO 1993 


NM 

1 M IVI 


001 679 

WW 1 w # w 


qpn in MO fi4.8 

OCU IU Imw LrrO 


Contia2578 RC 

OUI IUU£.vf w | XW 


SEQ ID NO 1994 


NM 

1 Ml VI 


001689 

WW 1 www 


qpn in MO 64Q 

OL-v* IL/ Imw w*tw 


Oontin2639 RC 


SEQ ID NO 1995 


NM 


001703 


qFn in mo fi^n 

OCW IL/ INU UJU 


Contia2647 RC 


SEQ ID NO 1996 


NM 


001710 


^PO in MO 6^1 

OuU I L/ INw wv/ 1 


Contia2657 RC 

Vwl IllU&Uuf 1 xw 


SEQ ID NO 1997 


NM. 


_001725 


SEQ ID NO 652 


Contig2728_RC 


SEQ ID NO 1998 


NM 


001730 


SEQ ID NO 653 


Contig2745_RC 


SEQ ID NO 1999 


NM 


001733 


SEQ ID NO 654 


Contig281 1_RC 


SEQ ID NO 2000 


NM 


001734 


SEQ ID NO 655 


Contig2873_RC 


SEQ ID NO 2001 


NM 


001740 


SEQ ID NO 656 


Contig2883_RC 


SEQ ID NO 2002 
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GenBank 


SEQ ID NO 


GenBank 


SEQ ID NO 


Accession Number 




Arpp^Qinn NiimhAr 




NM_ 


.001745 




Onntin9Q1S RO 
uui iuy<£.v7 i \J r\u 


SEO ID NO 2003 


NM_ 


.001747 




Onntia2Q98 RC 


SEO ID NO 2004 


NM_ 


.001756 


ccn m NO R^Q 


Oontia3094 RC 


SEQ ID NO 2005 

VLVX 1 L/ 1 N V/ £— \J\-t 


NM_ 


.001757 


SEO ID NO 660 

wt-Vx IL/ 1 N V/ UUU 


Oontia30Q4 RC 


SEQ ID NO 2006 


NM_ 


.001758 


SEQ ID NO 661 

OUVm 1 L/ INN-/ UU 1 


Contia3164 RC 


SEQ ID NO 2007 


NM_ 


.001762 


SFO ID NO 662 


Contia3495 RC 


SEQ ID NO 2009 

UUVN 1 L./ MV ^—\J\J\J 


NM_ 


.001767 


SFO ID NO 663 


Cnntin3R07 RC 

vui i uyuuu i rvv/ 


SEQ ID NO 2010 

wL\K IL/ 1 IV £m\J IV 


NM_ 


.001770 


cpn ID NO 664 

O Cw ILJ INU Out 


OUI I ULjv/Uv/v7 r\U 


SEO ID NO 2011 

VL- Vac IL/ INU £m\J 1 1 


NM_ 


.001777 


OLW IU INU DUO 


Cnntin^677 RC 
vUl 1 uy ou / / r\u 


SEO ID NO 2012 

VLW IL/ INU £m\J \ £-* 


NM_ 


.001778 


ceo in MO 666 

OC_W ILJ INU UUU 


Onntifi3689 RC 
uui lUyjOvUfc. r\v/ 


SEO ID NO 2013 

ULW |L/ 1 N V/ £—\J IU 


NM_ 


.001781 


SFO ID NO 667 


Contia3734 RC 

V/V/ 1 ILIV^V I UT 1 \w 


SEQ ID NO 2014 

VL. Vac 1 LV 1 lV £—\J I I 


NM_ 


.001786 


SFO ID NO 668 

Ol-Vx ILJ 1 N V/ UUU 


Contfa3R34 RC 


SEQ ID NO 2015 

VL_V»C II—/ llV £»W I w 


NM_ 


.001793 


SEQ ID NO 669 

uuvx 1 1— / inv/ UU\? 


Contia3876 RC 

wi I uyuu r u I \V/ 


SEQ ID NO 2016 

Vk>\t( 1 La/ I 1 V w 1 V/ 


NM_ 


.001803 


SFO ID NO 671 

OL.W ILJ INVJ U/ 1 


Cnntin3Q09 RC 
uui iLiyuc/u^. r\v/ 


SEQ ID NO 2017 

VI— VK 1 V 1 lU fcU 1 I 


NM_ 


.001806 


qcn in NO 679 


Pnntin^Q4n RC 

UUI IUljOv7*TU ixU 


SFO ID NO 9018 

vLiVx IL/ INv *_U IU 


NM_ 


001809 


qpo in MO 87** 


Pnntin4**ftn RC 
UUI IUU/tOQU rxU 


SFO ID NO 901 Q 

OLv IL/ mv ^.U 1 \J 


NM_ 


.001814 


qcn m MO 674 

OCU IL/ INU O / *r 


Cnntin4^88 RC 

UUI IllyrOOO IxU 


SFO ID NO 9020 

VLW IL/ INV*/ £-\J £-\J 


NM_ 


.001826 


ccn m MO 67^ 

Olw ILJ INU U 1 u 


Cnntin4467 RC 
uui i Liy ■ t *+u t r\u 


SEO ID NO 2021 

VL.VK IL/ INv/ ^.\J£- 1 


NM_ 


.001830 


SEO ID NO 677 

v_/ 1 — Vx 1 L/ 1 MV/ Ul f 


Contia4949 RC 


SEQ ID NO 2023 

VL.VX IL/ INV/ w w 


NM_ 


.001838 


SEO ID NO 678 

VI— V*c IL/ INV/ U/ U 


Contia5348 RC 

VUt IllUVV^w 1 x^/ 


SEQ ID NO 2024 

VL.VX IL/ INV/ LvL l 


NM_ 


.001839 


SEO ID NO 679 

OL.W IL/ IN V-/ U/ C7 


Contia5403 RC 


SEQ ID NO 2025 

V/ L_ VtC IL/ INV/ £—\J^\J 


NM_ 


.001853 


SEO ID NO 681 

OUW IL/ INU UU 1 


Contia5716 RC 


SEQ ID NO 2026 

V L— V»C 1 L/ INV/ *— \JAm\J 


NM_ 


.001859 


ccn m NO 689 


Cnntin6118 RC 
vui i uvju i i u r\v 


SEO ID NO 2027 

VL.VX IL/ l\V £-\J C- I 


NM_ 


.001861 


ccn m MO 68^ 


Cnntia6164 RC 

VUI lULJU 1 U*T r\v 


SEO ID NO 2028 

UL.VX IL/ INU £J\J£m\J 


NM_ 


.001874 


ccn in MO fift*i 

OlW IL/ INU OOJ 


CnntinR1ft1 RC 

UUI l lILjU I u 1 r\U 


SEO ID NO 2029 

UL.V IL/ INv/ £-\J 


NM_ 


.001885 


SFO ID NO 686 

OUVjC IL/ INU UUU 


Contiafi^14 RC 

VUllllVJUU 1 *T r\u 


SEQ ID NO 2030 

vLVx 1 1_/ INV/ «»wUW 


NM_ 


.001892 


SEO ID NO 688 

V L— V»< 1 L/ 1 N V/ UUU 


Contla6612 RC 


SEQ ID NO 2031 

V/La>VX 1 L/ INV/ fcWw 1 


NM. 


.001897 


SEO ID NO 689 

V/ LVj( 1 1—/ 1 NV UUw 


Contia6881 RC 

VUI IIIUUUU f 1 \ w 


SEQ ID NO 2032 

VL_V»C IL/ INV/ &«L/Vfc 


NM_ 


.001899 


SEQ ID NO 690 

LJ L.VX 1 L/ IMv/ Uwv/ 


Contia8165 RC 

vui uiyu 1 uu i \u 


SEQ ID NO 2033 

VLvVX 1 w/ 1 IV LwWw 


NM_ 


.001905 


SFO ID NO 6Q1 

OLW IL/ INU Uu 1 


Cnntin8991 RC 
uui iHLju^^e, i rvv 


SEO ID NO 2034 

VL.VX IL/ INV/ ^UU*T 


NM. 


.001912 


SFO ID NO 6Q9 

OL_U IL/ INU \J<3dL 


Cnntin8 < ^47 RC 
uui my oot 1 / r\u 


SEO ID NO 2035 

Vl— .Vx IL/ INV/ 4L.UUU 


NM_ 


.001914 


SFO in MO f\Cft 
ocu IL/ inu oyo 


Cnntin8^84 RC 
uui i iiLjoou*T r\u 


SFO ID NO 90^8 

OL_U 1 LJ INU lUOU 


NM. 


.001919 


SEO ID NO 694 

ULVi( IL/ 1 NU U^*T 


Contia8888 RC 

VUI IIIUUUUU 1 w 


SEQ ID NO 2038 

WL.\J( IL/ INV/ LUUU 


NM_ 


.001941 


SEQ ID NO 695 


Contig9259_RC 


SEQ ID NO 2039 


NM. 


.001943 


SEQ ID NO 696 


Contig9541_RC 


SEQ ID NO 2040 


NM. 


.001944 


SEQ ID NO 697 


Contig10268_RC 


SEQ ID NO 2041 


NM. 


001953 


SEQ ID NO 699 


Contig10363_RC 


SEQ ID NO 2042 
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GenBank SEQ ID NO j 
Accsssion iNumuer 


GenBank SEQ ID NO 

ArtA^^^ion Number 


miv/i nmo^A cpn in MO 700 
nivi uu i yo'f olu iu inu i w 


Contia10437 RC SEQ ID NO 2043 


miv/i nndQRR ocn in MO 701 
nM UUiyOD OuU IU IMvy /U l 


Contia11086 RC SEQ ID NO 2045 


Mivii nrno^A oca in MO 70? 
\JJVi UUT wOO OuU IU IMv / U^. 


Pontial 1275 RC SEQ ID NO 2046 


Mivyi nnHQCQ OCO in MO 70*} ' 
MM UUiyoo ocv*2 iu NU /UO 


Onntin11fi48 RC SEQ ID NO 2047 


ivm/i An-lOP-l QPO in MO 70^ 
MM UUiyD I olU IU NvJ /UO 


Cnntin12216 RC SEQ ID NO 2048 


MIV/I HA-l Q7H QCA IR MO 70R 

MM UUiy/U otU IU NvJ /UO 


Cnntin1238Q RC SEQ ID NO 2049 


miv/i nn*iQ7Q QPO in MO 707 

mm uu iy/ y ocvj iu imw /u/ 


Contia12814 RC SEQ ID NO 2050 


miv/i nrnopo QPO in MO 70ft 
MM UU TyoZ oCU IU INv^ / UO 


Contin12951 RC SEQ ID NO 2051 


mr/i nnoni7 QPO in MO 71 0 
MM uU^Ln / obU IU NvJ / IU 


Cnntin13480 RC SEQ ID NO 2052 


Miv/i nnonQQ QPO in MO 71^ 
MM UUZUOO ocvj IU INU / io 


Contin14284 RC SEQ ID NO 2053 


k i iv a nnon/ic OCA m MO 71 A 
MM UU^U4u ocw IU NvJ / l*t 


Onntin143Q0 RC SFQ ID NO 2054 


k it < aaoa A "7 OCO I P\ MO 71 R 

MM 0U2U4/ ocU IU NvJ HO 


Onntin147ft0 RC SFO ID NO 2055 


k ik i aaoac -i QPO I p\ MO 71fi 
MM OO^Uol ocvj IU NvJ / ID 


OnntinlAQ^ RC SFQ ID NO 2056 

OUT Illy l*TJ\rr f\v OlVx \\J ^uju 


kik j AAorvco OCA I Pv MO 717 

MM 002053 ocU lu NU (if 


OnntiniAQftl RP QPO in MO 20S7 

L-r Unity I*f570 1 r\V-/ OCU IIS ImV-/ £~\J\J 1 


KIR A AAOrtCH OCO 1 P\ MO 71 Q 

MM 002Uo1 ocvj IU NvJ no 


Or*nfin1^fiQ9 RP RFO ID NO 2058 
OUl Illy I vJUc/Z. r\V-/ OtU IL^ inv ^Wwiwr 


mr/i nnoncc QPO in MO 71 Q 
MM UU^uuu ocvj IU NvJ / iy 


Onntin1fi1Q2 RC SFQ ID NO 2059 

V-rUllliy IO Iv/L l\W OL.VX I IS l^lw ^\J\J\J 


MR! nnonfiQ QPO in MO 790 
MM UUZUOO ocvj IU NvJ f £\) 


Onntin1fi7^Q RC SFQ ID NO 2061 


Kin /l AHOH77 QPO in MO 799 
MM \J\jZ\)( 1 OCvJ IU NvJ I 


Onntin 18788 RC SFQ ID NO 2062 


Kilt /I AAOHQ1 QPO in MO 79*^ 

MM UU^Uyi ocU IU NvJ ( £0 


Onntin18Q0 |: i RC SEQ ID NO 2063 


k i iv yi AAOini QPO in MO 79A 
MM UUziUl OCvJ IU NvJ / 


Cnntin1710^ RC SEQ ID NO 2064 

V-rUllliy 1 / 1 V/O lAV/ WL.\>( iiS i^9\S £m\J\JT 


nnoi Aft qpo in MO 79*^ 

NM UUxiUO OCU IU \m\J I £Zj 


Cnntin1710^ RC SEQ ID NO 2065 


ma/i aaoi in QPO in MO 79fi 
MM UUZi IU oCU IU l\ivj t 


Cnntia 17248 RC SEQ ID NO 2066 


mm AAOI 1 1 QPO m MO 797 
IMM UUZ1 1 l oCvJIUNvJ/^/ 


Cnntin 17^45 RC SEQ ID NO 2067 


fviivvi AAOl 1 ^ QPO in MO 79ft 
IMM UU/ll IO OCU IU NvJ / ^O 


Cnntin18S02 RC SEQ ID NO 2069 

Vm/wIILIW lUvUb 1 \V *S L» w( ifc^ i^x^ ^www 


k ir a AA01 1Q QPO in MO 79Q 

IMM UUzl To ocvj JU NvJ /zy 


OnntinPOI^R RC SEQ ID NO 2071 


kir a AAOIOO QPO in MO 7*^0 

NM UU^IZo ocvj iu NvJ / ou 


Contin90^02 RC SEQ ID NO 2073 


MIV/l AH01Q1 QPO in MO 7^1 


Onntin90800 RC SEQ ID NO 2074 


KIKA AHOIQft QPO in MO 7^9 

NM UUZiOO otU IU NvJ / OZ 


Cnntin90817 RC SEQ ID NO 2075 

V_sVJI ILIy^ww 1 / rw UuVx i*s i tivx *— f w 


MAJ1 AAOI/IC QPO in MO 7^*^ 

NM uu^i4o ocvJ IU NvJ f oo 


Onntin90R9Q RC SFQ ID NO 2076 
uui iiiy^ou^Lw r\w ui-vx us c—\j i w 


Mr/I /"\n01£i/1 QPO in MO 7*^A 

NM UUZio4 ocvj IU NvJ /o*f 


Cnntin90RS1 RC SEQ ID NO 2077 

UU! ILiy^OUw 1 r\w uuvx tis i *\s § f 


MR>I AAO-lftQ QPO in MO 7^*^ 

NM UU2100 ocvj IU NvJ MO 


Or>ntin911^0 RC SFQ ID NO 2078 
our i uy i i ou r\v-> ocu \%s i^v i w 


h ik i AAOHOvl OCO ir\ MO 70C 

NM 002184 obU lu NvJ /oo 


Onntin911ft^ RC SFO ID NO 907Q 
L/unuy^ i i oo r\w ocu i is inu # ^ 


MM 0091ftR QPO in MO 737 

INIVI UUZ I OO Ol_W ILv INv i xJi 


Contia21421 RC SEQ ID NO 2080 


NM 002189 SEQ ID NO 738 


Contig21787_RC SEQ ID NO 2081 


NM 002200 SEQ ID NO 739 


Contig21 81 2_RC SEQ ID NO 2082 


NM 002201 SEQ ID NO 740 


Contig2241 8_RC SEQ ID NO 2083 


NM 002213 SEQ ID NO 741 


Contig23085 RC SEQ ID NO 2084 
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GenBank SEQ ID NO 
Apnp^^ion Number 

iVvuvOwlwl 1 1 VI III W V* 1 


GenBank SEQ ID NO 
Accession Number 

§ \\mf\mf\mm*+m*XmJ m*m* 1 1 ■ 1M| 1 Ih/Wl 


NM 002219 SEO ID NO 742 


Contia23454 RC SEQ ID NO 2085 


NM 002222 SEQ ID NO 743 


Contia24138 RC SEQ ID NO 2086 


NM 002239 SEO ID NO 744 

INIVI WwfcjCww V/L«-V*C IL/ 1 »w 1 ■ » 


Contia24252 RC SEQ ID NO 2087 ! 


NM 002243 SEO ID NO 745 

INIVI V/Vfafctv wLW II—/ 1 >l V/ f "TW 


Contia24655 RC SEQ ID NO 2089 


NM 002245 SEO ID NO 746 


Contia25055 RC SEQ ID NO 2090 


NM 002250 SEO ID NO 747 


Contia25290 RC SEQ ID NO 2091 


NM 0029*54 SEO ID NO 748 


Contia25343 RC SEQ ID NO 2092 


NM 002266 SEO ID NO 749 

INIVI \J\J £-f-\J\J w 1— Vac II-/ nv/ r Tw 


Contia25362 RC SEQ ID NO 2093 


NM 002273 SEO ID NO 750 


Contia25617 RC SEQ ID NO 2094 


NM 002281 SEO ID NO 751 

INIVI Uu^LU 1 w 1 — Vac II—/ liW f W 1 


Contia25659 RC SEQ ID NO 2095 


NM 002292 SEO ID NO 752 


Contia25722 RC SEQ ID NO 2096 


NM 002298 SEQ ID NO 753 

INIVI vw^£>v v viw\i( IL/ 1 iV t ww 


Contiq25809 RC SEQ ID NO 2097 


NM 002300 SEO ID NO 754 

INIVI VUtUWV It-/ 1 NVy # wT 


Contia25991 SEQ ID NO 2098 i 


NM 009308 SEO ID NO 755 

INIVI ww«£www 1 L/ I NW # WW 


Contia26022 RC SEQ ID NO 2099 


NM 009314 SEO ID NO 756 

INIVI w\J*Cw 1 *T ULW IL-/ |\V/ r ww 


Contia26077 RC SEQ ID NO 2100 

Wvl 1 LIVj* ■ W W I # 1 \V/ V/L-VX 1 L-' 1 » v/ A- 1 WW 


NM 0093^7 SEO ID NO 757 

INIVI UUt.JJ 1 v/l— vac 1 L/ Inv r wf 


Contla26310 RC SEQ ID NO 2101 


NM 009341 SEO ID NO 758 

INIVI wwaCw^T 1 OUVx IL/ liw # wW 


Contia26371 RC SEQ ID NO 2102 


NM 002342 SEO ID NO 759 

INIVI WW«£.W*T-^ w I— Va< IL-/ Mv f ww 


Contia26438 RC SEQ ID NO 2103 


NM 002346 SEO ID NO 760 

INIVI WW-tw*TW V/L— vac 11-/ INV/ I ww 


Contia26706 RC SEQ ID NO 2104 

v«/v^ I I Ll> W f \# V# 1 \w i— v*t I i ' v/ ^— ■ w • 


NM 002349 SEO ID NO 761 

INIVI WWaS.V^Tw vLVm IL/ 1 lW f w 1 


Contia27088 RC SEQ ID NO 2105 


NM 009^50 SFO ID NO 762 

INIVI ww«t.www OL—Vk I L-/ INV/ f W-C 


Contia27186 RC SEQ ID NO 2106 

V»#Wl IUVJ£- f 1 WW 1 XV/ W 1— Va( 1 fc/ • » vx b 1 WW 


NM 009358 SEO ID NO 763 

INIVI w waLw w w Ot_W IL/ INw / ww 


Contia27228 RC SEQ ID NO 2107 


NM 009358 SFO ID NO 764 

INIVI UvfcVvO vL vac IL/ INV-/ # w*T 


Contia27344 RC SEQ ID NO 2109 

WUI 1 LlWaC | W 1 1 1 XV/ wL_\X 1 » V^ 1 WW 


NM 009370 SFO ID NO 765 

INIVI ww-CO / %J OL.W IL-/ INV/ I ww 


Contia27386 RC SEQ ID NO 2110 

Wwl IllMa— I WWW l\V v/l— V*< 1 W 1 »W <£_ I IW [ 


NM 0093Q5 SFO ID NO 766 

INIVI WW-CWWW Ol — Vat IL/ INV/ * ww 


Contia27624 RC SEQ ID NO 2111 


NM 002416 SEQ ID NO 767 

1 NIVI \J\J^—ir I w v/I— Va< ■ l>— ' 1 IV f W f 


Contia27749 RC SEQ ID NO 2112 


NM 002421 SEQ ID NO 768 

INIVI \J\J£—~£— 1 W 1— VaC IL/ I^IV/ ■ WW 


Contia27882 RC SEQ ID NO 2113 

wwl 1 H< mm § ww«— I \w V»» * ' I m^ • ' *^— ■ » 


NM 002426 SEQ ID NO 769 

INIVI W W^^T^ W VL.VK 1 W 1 lW 1 WW 


Contia27915 RC SEQ ID NO 2114 

V/V^l ILIMa— 1 W ■ W 1 \W WL— NlS 1 W t ^ w *~ I a I 


NM 002435 SEQ ID NO 770 

INIVI W W^. • W W Wl— VaC II-/ ■ »w f f W 


Contia28030 RC SEQ ID NO 21 15 


NM 002438 SEO ID NO 771 

INIVI WWaC^Tww V/ 1— VaC IL/ INV/ f I 1 


Contia28081 RC SEQ ID NO 21 16 

V»#V#I IMWa&»WWV^ 1 1 XV/ wImNX 1 ■— ' 1 ^ V/ Jb— 1 1 V# 


NM 009444 SEO ID NO 772 

INIVI WW aw III v/L-Vac IL/ INV/ 1 1 £- 


Contia28152 RC SEQ ID NO 2117 

WWI IlilWawW 1 W^fc» 1 XV/ V/ksVaC 1 1-/ ■ » V/ *— ill 


NM 00944Q SEO ID NO 773 

INIVI WW*- T 1 W V/ Vac 1 L/ INV/ f 1 W 


Contia28550 RC SEQ ID NO 21 19 

wwl IVlWfcWWWW 1 W V/L- >V*C lb/ 1 tIV/ b 1 i w 


NM 002450 SEQ ID NO 774 


Contig28552_RC SEQ ID NO 2120 


NM 002456 SEQ ID NO 775 


Contig2871 2_RC SEQ ID NO 21 21 


NM_002466 SEQ ID NO 776 


Contig28888_RC SEQ ID NO 2122 


NM_002482 SEQ ID NO 777 


Contig28947_RC SEQ ID NO 21 23 


NM 002497 SEQ ID NO 778 


Contig291 26_RC SEQ ID NO 2 1 24 
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GenBank 


SEQ ID NO | 


GenBank 


SEQ ID NO 


Accession Number 




Af^p^QQinn Nnmhpr 




NM 


002510 

V w&*w 1 V/ 


cpn in NO 77Q 


Contin29193 RC 


SEQ ID NO 2125 


NM 


002515 


epn in NO 7fl1 


Pnntin29369 RC 


SEQ ID NO 2126 


NM 


002524 


ccn in NO 782 


nnntin29639 RC 


SEQ ID NO 2127 


NM 


002539 


qcn m NO 783 


Contin30047 RC 


SEQ ID NO 2129 


NM 


002555 


ccn in MO 785 

OCU IL/ INV^/ i OvJ 


Cnntia30154 RC 


SEQ ID NO 2131 


NM 


002570 


ccn in NO 7ft7 

OlU ILJ INU /Of 


Cnntin3020Q RC 


SEQ ID NO 2132 


NM 

1 ^1 IVI 


002579 


opo in MO 7Aft S 


Pnntin309 1 3 RC 


SEQ ID NO 2133 


NM 


002587 


QCfj in MO 7RQ 


Pnntin30930 RC 


SEQ ID NO 2134 


NM 

NIVI 


002590 

\J\J £~\J\J\J 


ccn in MO 7QO 


rnntin309R7 RC 
oui i uy ou^o # r\u 


SEQ ID NO 2135 


NM 

N IVI 


002600 


ccn in mo 7oi 


Pnntiri303Q0 RC 
our i ny OUOJ7U i\u 


SEO ID NO 2136 


1 N IVI 




Qpn in mo 7Q9 


oui i iiy outou r\v/ 


SEQ ID NO 2137 


NM 


002618 


ocn in MO 7Q4 


Pontin30fiOQ RC 


SEQ ID NO 2138 


NM 

INIVI 


002626 


ccn in mo 70^ 


fVintiri^OQ^M. RC 

ounuyoui70*+ t\\s 


^FO ID NO 2139 

OLW IU/ INN-/ *C IOv7 


NM 

N IVI 


002633 


ccn in MO 7QC 

otu IL/ Invj ( yo 


Pnntin^l 1 50 RC 
ouruiyo i i ou r\u 


SFO ID NO 2140 

o i — >o< iu/ iN^y <c i tv 


NM 

1 MIVI 


002639 

\J \J £— O O O 


CCA in MO 7Q7 


Pnntin^l 1 ftfi RC 

ounuyo i i oo r\v^ 


ccn in NO 2141 

OuW IU/ IMv It 1 


NM 

1 N IVI 


002648 


ccn in mo 7Qft 
otzw iu inv-/ #yo 


Print in 31 9^1 RC 


SEQ ID NO 2142 


NM 


002659 

\J \J W O W 


epn in MO 7QQ 
otu iu in vj #yy 


Pnntin31 9RR RC 
vUi 1 ity o i Loo rvv^/ 


SEQ ID NO 2143 


NM 


002661 


opn in mo ftoo 

olU IU InvJ OUU 


Pnntin319Q1 RC 
will I iiy o 1 £.v7 i r\v 


SEQ ID NO 2144 

V/UV)( 1 Lj/ 1 in/ «W I I 1 


NM 


002662 


opn m MO 801 

OCU ILJ INVJ OU 1 


Contia31295 RC 


SEQ ID NO 2145 

VIbVK 11—/ l^l>^ <w 1 i V/ 


NM 


002664 

X/ V/&- X/ V «^ 


opn in mo 809 

OuW ILJ INU OU^ 


Cnntin31424 RC 


SEQ ID NO 2146 

VLaVX 1 La/ 1 IV/ *— 1 ¥^V/ 


NM 

1 V IVI 


002689 

O O ^ v O v/ 


<5FO m MO 804 

OLW ILJ INvJ Out 


Cnntin3 1449 RC 


SEQ ID NO 2147 

V>/ 1— xj< 1 k/ 1 l V <w i r I 


NM 

1 Ti 1 VI 


002690 


opn in MO ftH5 

OlU ILJ INU OUJ 


Pnntin31*SQfi RC 


SEQ ID NO 2148 


NM 

1 >f 1 V 1 


002709 

Uv/t- f OO 


ccn m MO flfifi 
otvj IU InLJ ouo 


Pnntiri31ftR4 RC 
UUI l Liy O IOO*r r\w 


SEQ ID NO 2149 

OLVx 1 U/ 1 MN-/ 1 "TO 


NM 

1 >l IVI 


002727 


ccn in MO 9XY7 
olU IU InvJ OU/ 


Pnnt?n31Q2R RC 


SEQ ID NO 2150 i 

OLW IU/ 1 >1 V-/ <C 1 vV 


NM 


002729 


ccn in mo roa 

OtU IU INU OUO 


Pnntin31QRR RC 


SEQ ID NO 2151 

V/L_\j( 1 1— ' IMn-/ £i 1 O 1 


NM 


002734 


ccn m mo ftfiQ 
ocu iu inu ouy 


Pnntin31Qftfi RC 

OUIILiyO 1 S70U l\U 


SEO ID NO 2152 

OL.VK IL/ 1 >3 V/ <_ 1 OjC- 


NM 


002736 


opn m MO A10 

O Cw ILJ INU O 1 U 


Cnntin320ft4 RC 
vUi i uy jfcUv>*t^r\ v 


SEQ ID NO 2153 ! 

*0 L_ VK I L/ 1 IN/ <w 1 OO 


NM 

1 MIVI 


002740 

\J\J £~ i T\J 


opn m MO 81 1 
olU IU INvJ O I I 


Pnntin3910^ RC 
oui i LiyOiC iuj r\w 


SEQ ID NO 2154 


NM 

IN IVI 


002748 


opn m MO 81*3 
olU IU INvJ O IO 


Pnntin321RS RC 

L/UI 1 LiyO^. 1 OO r\w 


SEO ID NO 2156 

VLvX IU/ «C f OO 


NM 


009774 

\J\J/— i 1 *T 


ccn in mo 


Prkntin^99A9 DP 


^FO ID NO 2157 

OLW |U/ IN\-/ £- Ui 


NM 

IMIVI 


002775 

v OA— f # O 


ocn in mo 
ocL2 IU INU olO 


Pnrit!n^9^ # >9 RP 


ccn in NO 2158 

Olw Iu/ INU £L 1 JO 


NM 


002776 


ocn in Mn qh p 
obU IU N<J olo 


v-rOnuyozooo^rv*/ 


ccn m NO 21 5Q 

OlU Iu/ INo c- loo 


NM 


002789 


opn m MO 817 

OLW ILJ INvJ O 1 / 


Cnntin325 I 58 RC 


SEQ ID NO 2160 


NM 


002794 


SEQ ID NO 818 


Contig32798_RC 


SEQ ID NO 2161 


NM. 


_002796 


SEQ ID NO 819 


Contig33005_RC 


SEQ ID NO 2162 


NM 


002800 


SEQ ID NO 820 


Contig33230_RC 


SEQ ID NO 2163 


NM 


002801 


SEQ ID NO 821 


Contig33260_RC 


SEQ ID NO 2164 
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GenBank 


SEQ ID NO 


GenBank 


SEQ ID NO 


Accession Number 




Appp^Qinn Numh^r 




NM_ 


.002808 


SEQ ID NO 822 


Contia33654 RC 


SEQ ID NO 2166 

KJ 1— X*{ IL/ MX/ £— IUU 


NM_ 


002821 


SEQ ID NO 824 


Contia33741 RC 

X/UI lUUVWI ~ 1 1 \W 


SEQ ID NO 2167 


NM_ 


002826 


SEQ ID NO 825 

vl^xk 1 V 1 N X/ U4v 


Contia33771 RC 


SEQ ID NO 2168 

i_ x>c i l/ i ^ >— ✓ i uu 


NM_ 


.002827 


SEO ID NO 826 


Contia33814 RC 

Vvl lUMWwv • » 1 \V/ 


SEQ ID NO 2169 

XJL_Xa( IL/ 1 IV m\m 1 Wv 


NM_ 


.002838 


SFO ID NO 827 


Contia33815 RC 


SEQ ID NO 2170 ! 

VU\X IL/ 1 »V ^» 1 f X/ 


NM_ 


.002852 


qcrj ID NO 828 


Onntin33833 

vUl IliyOOOOO 


SEO ID NO 2171 

O 1 — W IU/ M v 1 / 1 


NM_ 


.002854 


epn in MD 82Q 

OI-\x Iv IN v O/LZj 


Onntin33QQ8 RO 

wUI IliyOOvvO rxv 


SEO ID NO 2172 

VL.VX IV MV A. 1 1 


NM_ 


.002856 


cca in NO 8^0 

OLW lis INU OOU 


Pnntin3407Q 


SEO ID NO 2173 

OCW IIS MV 4L. 1 1 O 


NM_ 


.002857 


ceo in no ft^i 

OCW IU/ inu OO 1 


Onntin^4fi80 RO 

vUl IUljO*TV/00 Ixv 


SFO ID NO 2174 


NM_ 


.002858 


SFO ID NO 832 

OUW IU liv UJ£. 


Contia34222 RC 


SEQ ID NO 2175 

VL_\X IV 1 M X-/ Cm | # U 


NM_ 


.002888 


SFO ID NO 833 


Contia34233 RC 


SEQ ID NO 2176 

VL.Vi( 1 IS IMV £— 1 I U 


NM_ 


.002890 


SEO ID NO 834 


Contia34303 RC 


SEQ ID NO 2177 

X/ L_X>C Ik/ 1 iV Iff 


NM_ 


.002901 


SFO ID NO 83fi 


Contin343Q3 RC 

vUl lUX|x/*TOvO 1 \v 


SEQ ID NO 2178 


NM_ 


.002906 


SFO ID NO 837 

vCxal Iv INv OOf 


Onntin34477 RC 


SEO ID NO 2179 

vU.x»c IV MO £. 1 l V 


NM_ 


.002916 


SFO ID NO 838 

OL-W IU/ 1 UOO 


Oontia347fifi RC 


SEQ ID NO 2181 

VL.VX 1 IS MV C 1 O 1 


NM_ 


.002923 


SFO ID NO 83Q 

OLU ILJ INv ooy 


Onntin34Q*>2 

vUl ILiyO*TC70^. 


SFO ID NO 2182 

O L_ Vj< IV 1 \ V £— IU£. 


NM_ 


.002933 


SEO ID NO 840 


Contia34989 RC 


SEQ ID NO 2183 

Vl-Vx IV mV IUU 


NM_ 


.002936 


SEO ID NO 841 

WUVx Iv IH\S trt 1 


Contia35030 RC 


SEQ ID NO 2184 


NM_ 


.002937 


SEO ID NO 842 

wUVx IIS 1 X \S vrt£, 


Contia35251 RC 


SEQ ID NO 2185 

VL.VX *IS MV t— IUU 


NM_ 


.002950 


SFO ID NO 843 

v£-x*c IIS l\\S LrrO 


Cnntio3 , Sfi2Q RC 

vvl lu^vvU^v r\v 


SEQ ID NO 2186 

VL.NK IL/ INV i— IUU 


NM_ 


,002961 


SEO ID NO 844 

wUVi( IIS 1 MX./ UTt 


Contia35635 RC 

Vvl IUmwwvUV 1 \V 


SEQ ID NO 2187 

VL.VX IV 1 » V 1 U f 


NM_ 


.002964 


SFO ID NO 84*1 

Ot_W IU/ INv OHJ 


Cnntin^783 RC 

vOI Illy OO / OO IX v 


SEO ID NO 2188 

0 1 — XK 1 IS INU Cm 1 OO 


NM_ 


_002965 


SFO ID NO 84R 

OCU IU/ INv 040 


Pnntin^R814 RC 

vUl iLiyooo 1 *t fx v 


SFO ID NO 218Q 

OLVi( IIS INv £- 1 Ov/ 


NM_ 


002966 


ccn in mo ftA7 

OlU ILJ WKs OHi 


Pnntin3R8Qfi RC 

vUl 1110,000570 fx v 


SFO ID NO 21 QO 

OlW Iv IMv 1 v70 


NM_ 


.002982 


SFO in NO RAQ 


Pnntin3*W7fi RC 

vUl Illy Oo v / O fx v 


SFO ID NO 21Q1 

OL.W 1 IS INO £m 1 \3 i 


NM_ 


_002983 


cpo m MO PRO. 

OCU IU iMxJ OOU 


oon uyoou^r^ r\v 


^i=n in no 91Q9 

OCW IU INv 1 c?Z. 


NM. 


_002984 


SFO in NO 8^1 

Ol-W IU/ INv OO 1 


fVintin3fifi81 RC 

vUMliyOUuO 1 IX v 


SFO ID NO 21Q^ 

vUVx Iv INO t C/O 


NM. 


_002985 


SFO in NO 8^9 


Pnntfn381 ^2 RC 
vol i u you i o^ fx v 


SFO ID NO 21Q4 

v 1 — xk 1 IS 1 N \S £- 1 %7*T 


NM_ 


_002988 


SFO in NO 8*^3 

OlU 1 Is INVJ OOO 


Pnntin381Q3 RC 
vol my oo i v?o r\v 


SFO ID NO 21Q*S 

OL.V IIS 1 W 1 uO | 


NM. 


_002996 


SFO in NO 8*=>4 


Pnntin38312 RC 

vol HJyOOO 1 £L fx v 


SEO ID NO 2196 

vLVx llS l\\S <m I C/O 


NM. 


_002997 


SFO in NO 8^ 

OCW IU/ INv OOO 


Pnntin3fi393 RC 
vol iiiyooo^io rxv 


SFO ID NO 91Q7 

OL.W 1 IS INO £m 1 vi 


NM. 


_002999 


SFO in NO 8^R 

OlW IU/ OJU 


Onntin3fi33Q RC 

vOI 1 Liy OOOOCr l\V 


SEO ID NO 2198 

ULV(( 1 IS IMV Cm 1 VU 


NM. 


_003012 


SEO ID NO 857 

wUVx IIS l\\S OOf 


Contia3R647 RC 

VUl 1 UUUUVJ~ I 1 \v 


SEQ ID NO 2199 

X/L_Xa( IV IMV mm. 1 Vw 


NM. 


_003022 


SEQ ID NO 858 


Contig36744_RC 


SEQ ID NO 2200 


NM. 


_003034 


SEQ ID NO 859 


Contlg36761_RC 


SEQ ID NO 2201 


NM. 


_003035 


SEQ ID NO 860 


Contig36879_RC 


SEQ ID NO 2202 


NM 


003039 


SEQ ID NO 861 


Contig36900_RC 


SEQ ID NO 2203 
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GenBank SEQ ID NO 
Accession rsiumDer 


GenBank SEQ ID NO 


Mii/i nnonc-i ccn in mo pro 
NM UUoUol OtU IU NU oOZ 


Pnntiri'V701 *^ RP QCH IH MO 9904 

our uiy o / u 10 r\u ocu il/ inw ^cu*t 


NM UUoUo4 otU IU NU ODo 


Pnntin'*709A RP QFO !R NO 990^ 
UUFillvjO / Uc4 r\v OCU IL/ l\U t^UJ 


kim AAQH^^ CPO IP! MO PAid 
NM UUOUDD OtU IU INU OD^ 


Pnntin'37079 RH QFO ID NO 9907 


NM UUOUOO otU IU InU ODO 


Pnntin'V7140 RP QPH m NO 9908 


maa nninoo oco in mo P££ 
IMM UVoU&V OCU IU NU ODD 


Pr\n+in^7l2l1 RP CCO IH NO 990Q 


MlVil nf\*af\0£5 O CO in MO P£%7 

NM UUoUyo otU IU NU 00/ 


Prinfin'*790A RP OCO in MO 9910 S 

uoniigor ^u*f r\u ocu iu inu c.jl iu 


mki nnor^Afi ocn in mo acq 
NM OUoUyy otU IU NU ooo 


Pnn+in^79fti RP CFO in NO 9911 
L/OniiyOf ^0 I r\U OCU IU INU \ 1 


rika nno-iAo o no in MO qcq 
NM OUolU^ otU IU NU ooy 


Pnntin^79ft7 RP CFO in MO 9919 I 

uonugo/ zo / r\u otiu iu inu zz iz 


K ik j AAOHAvl O C/"\ 1 PV KIO Q7H 

NM 003104 otu IU NU o/U 


rnnfin^7A^Q RP QPP m MO 991 

ooniigo / ^foy r\o otu iu invj io 


max nr\*24r\Q oc/^ in MO 07-1 
NM UOolOo otu IU NU on 


Pr»n+iriQ7^R9 RO OCO in MO 9914 

ooniigo/ ooz. r\0 ocu iu inu i*f 


miv;! nriQ^o-i ocn in mo P7*} 
NM OUol^l otU IU INU Of o 


Pontln^7^71 RP QPO in NO 991^ 
uui uiy o i o i i r\u ocw il/ inv-/ i«j 


mivvi nnQ-iQyf ceo m mo pxi 
NM uUolo4 otU IU INUo/h 


Pnnfio^7^QP QCO in MO 991f» 

uunuyo/ uyo ocw il/ inv 


miva nn*id*i"7 ceo m mo P7^ 
NM UUolo/ otU IU INU 0(0 


Pnntin^77* : ift RP QPO ID NO 9917 


kir a r\rvo aaa oco in KIO Q7^ 
NM 0Uo144 otUIUNUo/O 


Prkn+in^777P RP QCO in MO 991ft 

oonxigo/ # / o i\U ocu il/ iwj c-c. io 


My nnO^yfC OCO in MO C77 

NM UUo14t> otU IU INU O/ / 


Pr\nfin^7Pft/l RP CCA in MO 99 1Q I 


MKil AnOHylQ OCO in MO Q7Q 

NM 00o14y otU IU INU o(o 


Prvntin^7QAft RP CPO in MO 9990 


My AHQ-lC^ OCO in MO Q7Q 

NM_00o JO I otU IU INU o/y 


Pnnfin^ft170 RP C.PO in WO 9991 

oonugoo i # u r\u ocu il/ inu 1 


ki iv >i nnQ*iK7 ceo in mo ftftn 
NM OOolOf otU IU INU OOU 


Prfcntin^ft9ftft RP QFO in NO 999^ 


mm nnnw ceo in mo ppi 
NM UUolOO OtU IU INU oo 1 


Pnntin^ft^Qft RP QFO IO NO 995 '4 
Lrunuyooo^o r\u ocw iu inu ^.^.^.t 


mivji nnnfiK ceo in mo ppo 

NM UUolDO otU IU INU OOZ 


Pnntin^ft^ftO RP QFO m NO 999fi 

UUl llljjOQUQU rVU OCU IL/ INU £-£-£-\J 


K\tiji nriH70 ceo in mo qqq 
NM_JJ0o i f Z otU IU INU OOO 


Pnntin^ftR^O RP QFO in NO 9997 
uunugooDou r\u ocu il/ inu t-t-*- » 


Kip. a nno-177 OCO in KIO QQ A 

NM_0Uo1 / / otU IU INU oo4 


Prknfio^ftR £ \9 RP QFO in MO 999ft 

uoniigoooo/£ rvu ocu il/ inu ^.^./lo 


Mil onoHn7 oco in mo qqc 
NM_00o1y/ otU IU NU ooo 


Prkotin^Rftft^ RP oco in MO 999Q 

uoniigooooo r\u ocu iu inu 


k fi^yi Anoono oco in mo qqa 
NM OOozOZ otU IU INU ooO 


r*r\r\k\rnSC70fc RP CFO in MO 99^0 
uontigoOf ZX> r\U OCU IL/ INU ZZOU 


k i k yt AAOOHO OCO in MO QQ7 

NM__00o^1o otU IU NU oof 


Pnntin^ft7Q1 RP CCfj in NO 99^1 

uonxigoo / y 1 r\u otu iu inu £.z.o i 


k i(v 4 AA004 7 OCO in MO QQQ 

NM UOoZI / otU IU NU ooo 


Pnn+in^ftQOI RP QFO in NO 99^9 

uonugooyu i_ _r\u ocu il/ inu 


Mivyi AAQOOC OCO in MO QQQ 

NM \J\jo/LZo otU IU NU ooy 


PrM-itin^ftQft^ RP QFO IO NO 99^ 
uunugooyoo rxu ocu il/ inu ctoo 


My AAQOOC CCO in KIO POH 

NM yJUoZZo otU IU NU oyu 


Pr\ntin^QOQO RP QFO in NO 99^4 

uunugoyuyu r\u ocu il/ inu ^^o*+ 


KltiA AAOOOC oco in KIO POO 

NM OUoZob otU IU NU oyz 


Pnn+in^Q1^9 RP QFO in NO 99^ 

uonxigoy ioz r\u ocu il/ inu t^ou 


KIKil AHOOOA OCO 1 1~\ KIO QAO 

NM OOo^oy otU IU NU oyo 


n/M^+in^QI C\7 RP CFO in NO 99^fi 

uonugoy 10/ r\u ocw il/ inu ^400 


NM 003248 otU IU NU 894 


Printin^QOOft RP CFO in NO 99^7 

L/Oniigoyz^D rsu ocu iu inu c-t-o i 


NM 003255 ocU IU NU oyo 


PAn+inQQOQ^ Dp CFO in MO 99^ft 

uonugoyzoo r\u ocu iu inu zzoo 


kkva oaoocq ceo in mo pq£ 
NM UUozoo otu iu nu oyo 


Pnntin^Q^fi RP RFO in NO 9239 


NM 003264 SEQ ID NO 897 


Contig39591_RC SEQ ID NO 2240 


NM 003283 SEQ ID NO 898 


Contig39826_RC SEQ ID NO 2241 


NM 003318 SEQ ID NO 899 


Contig39845_RC SEQ ID NO 2242 


NM 003329 SEQ ID NO 900 


Contig39891_RC SEQ ID NO 2243 
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GenBank 

Accession Number 


SEQ ID NO 


GenBank 

Accession Number 


SEQ ID NO 


NM_ 


.003332 


SEQ ID NO 901 


Contia39922 RC 


SEQ ID NO 2244 


NM_ 


003358 


SEO ID NO 902 


Contia39960 RC 


SEQ ID NO 2245 


NM_ 


.003359 


SFO ID NO Q03 

OL-W 1 U 1 N V-/ v?UJ 


Contia40026 RC 


SEQ ID NO 2246 


NM_ 


003360 


SFQ ID NO 904 


Contia40121 RC 


SEQ ID NO 2247 


NM_ 


003368 


SFQ ID NO 905 


Contia40128 RC 


SEQ ID NO 2248 i 


NM_ 


.003376 


SFO ID NO 906 


Onntin40146 


SEQ ID NO 2249 


NM_ 


.003380 


SEO ID NO 907 


Contia40208 RC 


SEQ ID NO 2250 


NM_ 


.003392 


CCA in NO QOft 
OLW IU INVJ C7UO 


Onntin409 12 RC 


SEO ID NO 2251 


NM_ 


.003412 


CiFO in MO QHQ 


Pnntin4093R RO 
uui iuy*TiJfaOQ _ .rxu 


SFO ID NO 99^9 I 


NM 


003430 


oca in mo oin 
otu iu invj y iu 


uun iiy^rU'rO'T r\u 


cnn in MO 99*53 

O^W IU INU £-£-yJO 


NM 


003462 


opn in mo cm 1 


UU 1 1 Liy *r U*t*t u rxU 


SFO IO MO 99^4. ^ 

OuVx IU INU Jt 


NM_ 


.003467 


opn in mo cm 9 


UUJ 1 Uy*rUUUU r\v 


SFO ID MO 99^*5 

OL.U IU INU fc^vJ 


NM_ 


.003472 


opo in kin CM - } 


Pnntin40*i73 RH 
uui luytuu i o rxu 


SFO ID NO 99^ 

OLU IU InU LLJU 


NM_ 


.003479 


qcn in MO Q14 

OCW IU INVJ 57 It 


Pnntin40813 RC 


SFO ID NO 2258 * 

Ot.U IU INw f >£>..\J\J 


NM_ 


.003489 


cpn m NO 91^ 


Or*ntia40816 RC 

uui iLiy*Tww i u rx w 


SEO ID NO 2259 


NM_ 


.003494 


opn m MO Q1fi 

OC_W IU INVJ C7 ! U 


PnnWn40R4^ RC 

UUI 1 Liy*+L/0*Tvl FxU 


SFO ID MO 22fi1 

O 1 IU INU £m£m\3 1 


NM_ 


.003498 


opn m WO Q17 

OCW IU INVJ C7 I / 


Pnntin40R89 RC 

UUI ILiy*+L/OvJv7 r\v 


SEO ID NO 2262 


NM_ 


.003504 


opn in MO Q1Q 

Ol_W IU INVJ C7 I <j 


Pnntin4103^ 


SFO ID NO 2263 


NM_ 


.003508 


CFO in NO 990 
ol_w iu invj 


Pontin41234 RC 
uui luyt i faw*T rvu 


SEO ID NO 2264 


NM_ 


.003510 


SFO ID NO 991 

OuW IU INVJ \J £- 1 


Pnntin41413 RC 

UUI 1 liy*T It 1 \J Ix w 


SEO ID NO 2266 


NM_ 


.003512 


SEO ID NO 922 


Contia41521 RC 


SEQ ID NO 2267 


NM_ 


.003528 


SEO ID NO 923 


Contia41530 RC 


SEQ ID NO 2268 


NM_ 


.003544 


SEO ID NO 924 
o i— iu inv/ v/tt 


Contia41590 


SEQ ID NO 2269 


NM_ 


_003561 


SEO ID NO 925 


Contia41618 RC 


SEO ID NO 2270 


NM. 


.003563 


opn m NO Q9fi 

OLW IU INVJ <J/L\J 


Pontin41R24 RC 
uui i uy*T i Ufa*t rxu 


SEO ID NO 2271 


NM. 


.003568 


qpn m NO Q97 

OCW IU INVJ Xjc- I 


Pnntin41fi3*S RC 
uui iuy*r i Uvv rxu 


SFO ID NO 2272 


NM. 


.003579 


cpn m MO Q9R 


Pnntin41fi7R RC 

UUI iuy fc r I U I U r\v 


SEO ID NO 2273 


NM. 


.003600 


SFO in NO 929 


Contia41689 RC 

uui iiiy*r i uug rvu 


SEO ID NO 2274 


NM. 


.003615 


SFO in NO 931 

OL.W IU INVJ \J\J 1 


Contin41804 RC 

UUI lliy*T 1 UVT f\v 


SEO ID NO 2275 1 


NM. 


.003627 


SEO ID NO 932 


Contia41887 RC 

UUI 1 UyT 1 UU f 1 XX/ 


SEQ ID NO 2276 


NM. 


.003645 


SFO ID NO 93^ 

uLW IU INVJ C700 


Pnntln4190 , S RC 
uui iLiy*T i C7 v/ \j r\v 


SEO ID NO 2277 

UUVX IU 1 N V-/ £mCm 1 1 


NM. 


.003651 


SFO in MO Q3fi 

OCw IU INVJ c/OvJ 


PnntinJ.1 Q^4 RP 

UUI luy*T 1 Ovrr ixU 


SFO ID NO 2978 

O I— VjC IU INU Am£mi %J 


NM. 


.003657 


SFO ID NO 937 

OL- V*t IU INVJ GO/ 


Contia41983 RC 

uui iuy*T i guv i xw 


SEO ID NO 2279 


NM. 


.003662 


SEQ ID NO 938 


Contig42006_RC 


SEQ ID NO 2280 


NM. 


.003670 


SEQ ID NO 939 


Contig42014_RC 


SEQ ID NO 2281 


NM. 


.003675 


SEQ ID NO 940 


Contig42036_RC 


SEQ ID NO 2282 


NM 


003676 


SEQ ID NO 941 


Contig42041_RC 


SEQ ID NO 2283 
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GenBank 

Accession Number 


SEQ ID NO 


GenBank 

Apf^Q^ion Numbpr 


SEQ ID NO 


NM 


003681 


qpn in NO Q49 


Contia42139 


SEQ ID NO 2284 


NM 


003683 

XJ %^ \J \mJ 


9FO ID NO 


Contia42161 RC 


SEQ ID NO 2285 


NM 


003686 


cpn in NO Q44 

OlU ILJ InU v7*t*t 


Cnntin42220 RC 


SEQ ID NO 2286 


NM 


003689 


qpo in Kin qar 


Onntin42^0fi RC 

UvJi ILiy*T*.OUU f\v 


SEQ ID NO 2287 


NM 


003714 


qpo in mo ctdfi 


Pnntin42^1 1 RC 


SEQ ID NO 2288 


NM 


003720 

X*N/%^ I 


qpn in MO Q47 


Cnntin42313 RC 
uui 1 uytt,o i o i xu 


SEQ ID NO 2289 

ll-/ I 'ivy r ■ iw\y 


NM 


003726 


ocn in MO Q4ft 

OCW IU INU v7HO 


Onntin4?402 RC 

ULJI iUy"<c-. fc rV/«£. I x\/ 


SEQ ID NO 2290 


NM 


003729 


ciFO in NO Q4Q * 


Cnntia42421 RC 

Uwl ILiy*-r«£.*T.C 1 1 xVz 


SEQ ID NO 2291 


NM 


003740 


CiFO in NO Q^ft 


Contin42430 RC 


SEQ ID NO 2292 


NM 


003772 


cpn in NO QS2 

Ot.\i( IU INU ZJ\JC- 


Contia42431 RC 


SEQ ID NO 2293 


NM 


003791 


OCW IU INU C\JJ 


Contia42542 RC 


SEQ ID NO 2294 


NM 


003793 


ccn m NO Q*S4 

OLU IU INU C7U*-r 


Contia42582 


SEQ ID NO 2295 


NM 


003795 


OCW IU INU %7\J*J 


Contio42B31 RC 


SEQ ID NO 2296 


NM_ 


_003806 


<^FO in NO Q^fi 

O L_U IU INU \JsJ\J 


Cnntia42751 RC 


SEQ ID NO 2297 


NM 


_003821 


ccn in NO 

OCU IU INU \3\J 1 


Contia42759 RC 


SEQ ID NO 2298 


NM 


003829 


ccn in NO Q*Sft 

OlU IU INU v7JO 


Cnntin430*S4 

V/UI lLiy"'Twv7w*T 


SEQ ID NO 2299 


NM 


003831 


ccn m NO Q'SQ 

OCU IU INU 


Cnntia43079 RC 


SEQ ID NO 2300 


NM 


003862 


ccn m NO QRO 

OlU IU INU 


Contia43195 RC 


SEQ ID NO 2301 


NM 


003866 

%✓ V** V** 


9FO in NO QR1 

OlU IU INU C7U 1 


Contia43368 RC 

Uwl 1 tlvj*TwsJV/U_ 1 xU 


SEQ ID NO 2302 


NM_ 


_003875 


ccn m NO QR9 

OCU IU INU C7U.C 


Onntin43410 RC 

UVJI IUy*Tw*r 1 VJ fxU 


SEQ ID NO 2303 

V/L—V4C 1 L/ 1 N X/ A.V/WW 


NM 


003878 


ccn m NO Qfi^ 

OCU IU INU v?UO 


Onntin4347R RC 


SEQ ID NO 2304 


NM 


003894 


ccn m NO QR^ 

OCU IU INU C7Uv-> 


Contia43549 RC 


SEQ ID NO 2305 


NM 


003897 

\S \S \J %S m 


ccn m MO Qfifi 

OCU IU INU 57UU 


UvJi luy^tovtv ixu 


SEQ ID NO 2306 


NM 


003904 


ccn m NO QR7 
ocu iu inu yor 


Onntin43R48 RC 

Uwl ILly*Twx/*Tv/ IxU 


SEQ ID NO 2307 

V/L_XJ< IL/ 1 IV/ fcV/w f 


NM 


003929 


ccn m NO Qfift 
ocu iu inu yoo 


Pnntin43R73 RC 
UvJi iuy*Tww i w ix u 


SEQ ID NO 2308 


NM 


003933 


ccn m NO QRQ 

OCU IU INU v7U57 


Contin4367Q RC 

UVJI lliy*TwVJf %7 IxU 


SEQ ID NO 2309 


NM 


003937 


ccn m NO Q70 

OCU IU INU %7 1 \J 


Contia43R94 RC 


SEQ ID NO 2310 


NM 


J303940 


^FO in NO Q71 

OCU IU INU v7 f 1 


Contin43747 RC 


SEQ ID NO 2311 

V/Ia—VaC 1 L/ liw a&_\/ 1 1 


NM 


003942 


ccn m NO Q79 

OCU IU INU C7 / 


Cnntin43Q18 RC 


SEQ ID NO 2312 


NM 


003944 


<^FO in NO Q7*3 

OCU IU INU J7f O 


Cnntia43983 RC 


SEQ ID NO 2313 


NM. 


_003953 


^5FO in NO Q74 

OCU IU INU %7/*t 


Contia44040 RC 


SEQ ID NO 2314 


NM 


003954 


ccn m NO Q7^ 

OCU IU INU 57 / yJ 


Contin44064 RC 


SEQ ID NO 2315 


NM 


003975 


SEQ ID NO 976 

V/ L_ XjtC 1 1/ 1 Tt V/ wf \/ 


Contig44 1 95_RC 


SEQ ID NO 2316 


NM 


003981 


SEQ ID NO 977 


Contig44226_RC 


SEQ ID NO 2317 


NM. 


_003982 


SEQ ID NO 978 


Contig44289_RC 


SEQ ID NO 2320 


NM 


003986 


SEQ ID NO 979 


Contig44310_RC 


SEQ ID NO 2321 


NM 


004003 


SEQ ID NO 980 


Contig44409 


SEQ ID NO 2322 
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oenoanK obu lu in(j 
Accession Number 


Accession Number 


NM_004010 SEQIDN0 981 


Contig4441 3_RC SEQ ID NO 2323 


NM_004024 SEQ ID NO 982 


Contig44451_RC SEQ ID NO 2324 


NM_004038 SEQ ID NO 983 


Contig44585_RC SEQ ID NO 2325 


NMJ304049 SEQ ID NO 984 


Contig44656_RC SEQ ID NO 2326 


NM_004052 SEQ ID NO 985 


Contig44703_RC SEQ ID NO 2327 


NM_004053 SEQ ID NO 986 


Contig44708_RC SEQ ID NO 2328 


NM_004079 SEQ ID NO 987 


Contig44757_RC SEQ ID NO 2329 


NM_0041 04 SEQ ID NO 988 


Contig44829_RC SEQ ID NO 2331 


NM_004109 SEQ ID NO 989 


Contig44870 SEQ ID NO 2332 


NM_0041 1 0 SEQ ID NO 990 


Contig44893_RC SEQ ID NO 2333 


NM_004120 SEQ ID NO 991 


Contig44909_RC SEQ ID NO 2334 


NM_004131 SEQ ID NO 992 


Contig44939_RC SEQ ID NO 2335 


NMJ304143 SEQ ID NO 993 


Contig45022_RC SEQ ID NO 2336 


NM_0041 54 SEQ ID NO 994 


Contig45032_RC SEQ ID NO 2337 


NM 004170 SEQ ID NO 996 


Contiq45041 RC SEQ ID NO 2338 


NM_004172 SEQ ID NO 997 


Contig45049_RC SEQ ID NO 2339 


NM_0041 76 SEQ ID NO 998 


Contig45090_RC SEQ ID NO 2340 


NM_0041 80 SEQ ID NO 999 


Contig45156_RC SEQ ID NO 2341 


NM_004181 SEQ ID NO 1000 


Gontig4531 6_RC SEQ ID NO 2342 


NM_004184 SEQ ID NO 1001 


Contig45321 SEQ ID NO 2343 


NM_004203 SEQ ID NO 1002 


Contig45375_RC SEQ ID NO 2345 


NM_004207 SEQ ID NO 1003 


Contig45443_RC SEQ ID NO 2346 


NM_00421 7 SEQ ID NO 1 004 


Contig45454_RC SEQ ID NO 2347 


NM_00421 9 SEQ ID NO 1 005 


Contig45537_RC SEQ ID NO 2348 


NMJD04221 SEQ ID NO 1006 


Contig45588_RC SEQ ID NO 2349 


NMJ304233 SEQ ID NO 1007 


Contig45708_RC SEQ ID NO 2350 


NM_004244 SEQ ID NO 1008 


Contig45816_RC SEQ ID NO 2351 


NM_004252 SEQ ID NO 1009 


Contig45847_RC SEQ ID NO 2352 


NM_004265 SEQ ID NO 1010 


Contig45891_RC SEQ ID NO 2353 


NM_004267 SEQ I D NO 1 01 1 


Contig46056_RC SEQ ID NO 2354 j 


NM_004281 SEQ ID NO 1012 


Contig46062_RC SEQ ID NO 2355 


NM_004289 SEQ ID NO 1013 


Contig46075_RC SEQ |D NO 2356 


NM_004298 SEQ ID NO 1015 


Contig46164_RC SEQ ID NO 2357 


NM_004301 SEQ ID NO 1016 


Contig4621 8_RC SEQ ID NO 2358 


NM_004305 SEQ ID NO 1 01 7 


Contig46223_RC SEQ ID NO 2359 


NM_00431 1 SEQ ID NO 1018 


Contig46244_RC SEQ ID NO 2360 


NM 004315 SEQ ID NO 1019 


Contig46262_RC SEQ ID NO 2361 
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GenBank 


SEQ ID NO 


GenBank 


SEQ ID NO 


Accession Number 




Accession Number 




NM_ 


004323 


SEQ ID NO 1020 


Cont(a46362 RC 


SEQ ID NO 2364 


NM 


004330 


SEQ ID NO 1021 


Contiq46443 RC 


SEQ ID NO 2365 


NM 


004336 


SEQ ID NO 1022 


Contia46553 RC 


SEQ ID NO 2367 


NM 


004338 


SEO ID NO 1023 


Contia46597 RC 


SEQ ID NO 2368 


NM 


004350 


SEQ ID NO 1024 


Contiq46653 RC 

1 1 VI j>j • V Vj^ V^%^ 1 m 


SEQ ID NO 2369 


NM 


004354 


SEQ ID NO 1025 

w^v< 1 1/ l iv/ i v/*™v/ 


Contiq46709 RC 


SEQ ID NO 2370 


NM 


004358 


SEQ ID NO 1026 

W L— x-a< IU/ 1 » 1 w&V 


Contiq46777 RC 


SEQ ID NO 2371 


NM 


004360 


SEO ID NO 1027 


Contia46802 RC 


SEQ ID NO 2372 


NM 


004362 


SEO ID NO 1028 

VL>\JC 1 1— / 1 IV/ 1 V/^W 


Contia46890 RC 


SEQ ID NO 2374 


NM 


004374 


SEQ ID NO 1029 


Contiq46922 RC 


SEQ ID NO 2375 


NM 


004378 


SEQ ID NO 1030 


Contig46934_RC 


SEQ ID NO 2376 


NM 


004392 


SEQ ID NO 1031 


Contig46937_RC 


SEQ ID NO 2377 


NM 


004395 


SEQ ID NO 1032 


Contig46991_RC 


SEQ ID NO 2378 


NM 


004414 


SEQ ID NO 1033 


Contig47016_RC 


SEQ ID NO 2379 


NM 


004418 


SEQ ID NO 1034 

v/ L_ Vj< IL/ 1 *V ■ V/ V/ r 


Contiq47045 RC 


SEQ ID NO 2380 


NM 


004425 


SEO ID NO 1035 

WL_vft( l »— ' I ivy i v/ v/ v/ 


Contiq47106 RC 


SEQ ID NO 2381 


NM 


004431 


SEO ID NO 1036 


Contia47146 RC 


SEQ ID NO 2382 


NM 


004436 


SEO ID NO 1037 


Contia47230 RC 


SEQ ID NO 2383 


NM 


004438 


SEQ ID NO 1038 

\JU\X it/ 1 » V_/ 1 V/ V/ V/ 


Contia47405 RC 


SEQ ID NO 2384 


NM 


004443 


SEQ ID NO 1039 


Contia47456 RC 


SEQ ID NO 2385 


NM 


004446 


SEO ID NO 1040 


Contia47465 RC 


SEQ ID NO 2386 


NM 


004451 


SEQ ID NO 1041 


Contia47498 RC 


SEQ ID NO 2387 


NM 


004454 


SEO ID NO 1042 

VL-\H 1 L>/ liV I VTfa 


Contia47578 RC 


SEQ ID NO 2388 


NM 


004456 


SEO ID NO 1043 

V/L.\x 1 L/ 1 >• V/ 1 WTw 


Contia47645 RC 


SEQ ID NO 2389 


NM 


004458 


SEO ID NO 1044 

OLW 1 L-/ 1^1 V«/ I V/ 1 1 


Contia47680 RC 


SEQ ID NO 2390 


NM 


004472 


SEQ ID NO 1045 


Contia47781 RC 


SEQ ID NO 2391 


NM 


004480 


SEO ID NO 1046 

V_/L_\^C \\J M\/ 1 V/ I >/ 


Contiq47814 RC 


SEQ ID NO 2392 


NM 


004482 


SEO ID NO 1047 

WL>v( IL/ 1 iv/ 1 V/ ■ 1 


Contia48004 RC 

VV/I HIM IWW v ■ 1 


SEQ ID NO 2393 


NM 


004494 


SEQ ID NO 1048 


Contia48043 RC 

Vwl IMM IWW ■ W/ I 


SEQ ID NO 2394 


NM 


004496 


SEO ID NO 1049 

WL.VK 1 L/ 1 1W • V/ i w 


Contiq48057 RC 


SEQ ID NO 2395 


NM 


004503 


SEQ ID NO 1050 

v/ L_ Vj< 1 1/ 1 X V/ 1 V/ v/ V/ 


Contiq48076 RC 


SEQ ID NO 2396 


NM 


004504 


QFO ID NO 1051 

WL»VX 1 L-/ INN-/ 1 vw 1 


Contia48249 RC 


SEQ ID NO 2397 


NM 


004515 


SEQ ID NO 1052 


Contig48263_RC 


SEQ ID NO 2398 


NM 


004522 


SEQ ID NO 1053 


Contig48270_RC 


SEQ ID NO 2399 


NM 


004523 


SEQ ID NO 1054 


Contig48328_RC 


SEQ ID NO 2400 


NM 


004525 


SEQ ID NO 1055 


Contig48518_RC 


SEQ ID NO 2401 


NM 


004556 


SEQ ID NO 1056 


Contig48572_RC 


SEQ ID NO 2402 
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GenBank 

Accession Number 


SEQ ID NO 


GenBank 

Accession Number 

* Ik^V^WWWIV^I I I 1MI ■ ■■a#V/l 


SEQ ID NO 


NM. 


.004559 


SEQ ID NO 1057 


Contiq48659 RC 


SEQ ID NO 2403 j 


NM. 


_004569 


SEQ ID NO 1058 


Contig48722_RC 


SEQ ID NO 2404 


NM. 


.004577 


SEQ ID NO 1059 


Contlg48774_RC 


SEQ ID NO 2405 i 


NM. 


_004585 


SEQ ID NO 1060 


Contlg48776_RC 


SEQ ID NO 2406 


NM. 


.004587 


SEQ ID NO 1061 


Contig48800_RC 


SEQ ID NO 2407 


NM. 


.004594 


SEQ ID NO 1062 


Contia48806 RC 

k/k/i UlU^VvwW 1 


SEQ ID NO 2408 

k^^»Vdt IL/ 1 IV/ £— TwV/ 


NM. 


.004599 


SEQ ID NO 1063 


Contia48852 RC 

k/V^l 1 Hkj I V Wk I 


SEQ ID NO 2409 

*/Ui%J( Ik/ 1 IV •k»^TWW 


NM. 


.004633 


SEQ ID NO 1066 

k/ VsC 1 k/ 1 k/ 1 VWV/ 


Contia48900 RC 


SEQ ID NO 2410 


NM. 


.004642 


SEQ ID NO 1067 

Wl— NX Ik/ I M >/ I ww 1 


Contia48913 RC 

k/k/l IHM ' wV/ 1 k/ 1 W 


SEQ ID NO 241 1 

V/U.VX Ik/ INV ^— »^ 1 1 


NM. 


.004648 


SEQ ID NO 1068 


Contia48970 RC 

V/%^J IUM ■ V^W f V* 1 mkar 


SEQ ID NO 2413 I 


NM. 


.004663 


SEQ ID NO 1069 


Contla49058 RC 


SEQ ID NO 2414 

k^ 1 i VAC 1 k^ 1 "I k/ a— • • V 


NM. 


.004664 


SEQ ID NO 1070 


Contia49063 RC 

k/k^i I Mkf r w w\/ 1 Xk/ 


SEQ ID NO 2415 


NM. 


.004684 


SEQ ID NO 1071 

k/k— ka( I ' It V I Wf 1 


Contia49093 


SEO ID NO 2416 


NM_ 


.004688 


SEO ID NO 1072 


Contia490Q8 RC 


SFO ID NO 9417 


NM. 


.004694 


SEO ID NO 1073 

ULVtc IL/ liV 1 Ul v 


Contia491fiQ RC 

\-/k/i iiiy*TC7 i VJC7 i \v/ 


SFO ID NO 

OL\j( IL/ l\V/ aS.*T IO 


NM. 


.004695 


SEO ID NO 1074 

V ^NX 1 1 N k/ 1 VI ~ 


Contla49233 RC 


SFO ID NO 941 Q 


NM_ 


_004701 


SEQ ID NO 1075 


Contia49270 RC 


SFO ID NO 2420 


NM_004708 


SEQ ID NO 1077 

k/ >>K » k^ I ^ k/ 1 N/ ■ f 


Contia49282 RC 

k/k/l ItlkJ rVkUk 1 X^/ 


SEQ ID NO 2421 

k»l—\X IU* 1 1 W aW"Tk 1 


NM_ 


_004711 


SEQ ID NO 1078 


Contig49289_RC 


SEQ ID NO 2422 


NM. 


_004726 


SEQ ID NO 1079 


Contiq49342 RC 


SEQ ID NO 2423 

k/l^kJt Ik/ 1 1 W C— W^4kik/ 


NM_ 


.004750 


SEQ ID NO 1081 


Cont.a49344 


SEQ ID NO 2424 

k»k.\x Ik/ Mv a— • rk~ 


NM. 


.004761 


SEQ ID NO 1082 

k/ l—\K 1 k/ 1 t V/ . I k/k/a»« 


Contia49388 RC 


SEQ ID NO 2425 

k-/ k>\K I k/ 1 N W kTk W 


NM_ 


.004762 


SEQ ID NO 1083 


Contia49405 RC 


SEO ID NO 2426 

wL_Vx IL/ 1 lv k^kU 


NM_ 


.004780 


SEQ ID NO 1085 

kS L— »V»C Ik/ 1 IV 1 www 


Contia49445 RC 


SEQ ID NO 2427 

k/k»\X IL/ llV ^TX£m I 


NM_ 


.004791 


SEQ ID NO 1086 

k. * ii i ■ SH I kZ 1 *t I X/ k/ 


Contia49468 RC 

X/k/t lUUTwTWw 1 XX^ 


SEQ ID NO 2428 

k/ L.VX IL/ 1 iv k~kU 


NM_ 


.004798 


SEQ ID NO 1087 


Contia49509 RC 


SEQ ID NO 2429 

k/ La V*C i 1^ 1 » kZ a^— rk w 


NM_ 


004808 


SEQ ID NO 1088 


Contig49578_RC 


SEQ ID NO 2431 


NM_ 


004811 


SEQ ID NO 1089 


Contia.49581 RC 

V/ 1 1 *lkj 1 WWW 1 1 


SEQ ID NO 2432 

VkiVK Ik/ 1 HV/ ai» i Vt_ 


NM_ 


.004833 


SEQ ID NO 1090 

V/fc»— V3*. lfc/ ■ IV/ ■ %^X/%/ 


Contia49631 RC 


SEQ ID NO 2433 S 

wl_\X IL/ | IV aw~ww J 


NM_ 


004835 


SEQ ID NO 1091 

k/1— NJC Ik/ 1 XV/ 1 WW 1 


Contia49673 RC 


SEO ID NO 2435 

vL_Vm IL/ l\w kTWV 


NM_ 


004843 


SEQ ID NO 1092 

k/ L— k*t Ikr 1 XZ I VVk 


Contia49743 RC 


SEO ID NO 2436 

WL.VX 11/ Mv k"TUv 


NM_ 


.004847 


SEQ ID NO 1093 

k/ 1— - > JC Ik/ I >l k/ ■ X/ V/ V/ 


Contia49790 RC 


SEQ ID NO 2437 

V/LaVX Ik/ 1 ^1 k/ kTvJi 


NM_ 


.004848 


SEQ ID NO 1094 


Contig49818_RC 


SEQ ID NO 2438 


NM_ 


004864 


SEQ ID NO 1095 


Contig49849_RC 


SEQ ID NO 2439 


NM_ 


004865 


SEQ ID NO 1096 


Contlg49855 


SEQ ID NO 2440 


NM_ 


.004866 


SEQ ID NO 1097 


Contig49910_RC 


SEQ ID NO 2441 


NM 004877 


SEQ ID NO 1098 


Contig49948_RC 


SEQ ID NO 2442 
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SEQ ID NO 


GenBank 


SEQ ID NO 


Accession Number 




ArwaQQinn Nnmhpr 

'AVAfCoOlUI 1 IMUIIIUCI 




MM 

NIVI 


004900 


opn in no ioqq 


Contia50004 RC 

v/v/i my wv/ v/v/*t__i w 


' SEQ ID NO 2443 

fa^ fav 1 fa^ I ~ ^™ » ■ W 


MM 

INI VI 


004906 

\J V/~W W W 


ocn in MO 1100 

OCU I LJ Inv»/ 1 lUu 


Hnntin *S00Q4 
V-/V/I my wwv/w~ 


SEQ ID NO 2444 


MM 

NIVI 


004910 


qcn in NO 1 101 

OCvJ IL/ INU 1 1 V/ 1 


Contia50120 RC 


SEQ ID NO 2446 


NM 

Tl 1 VI 


004918 


ocn in NO 1103 

OHU IL/ INv I IUO 


Contia50153 RC 

vvl ILiy v/V/ 1 v/w 1 \v/ 


SEQ ID NO 2447 


NM 

NIVI 


004923 


qpn in NO 1104 

OlU IL/ INU 1 1 ut 


Contia50189 RC 

WUI Illy ww 1 V/ v 1 \w 


SEQ ID NO 2448 


NM 

il IVI 


004938 


ccn m NO 1 105 

O LVx IL/ INV-/ 1 IUJ 


Contia50276 RC 

WV/I lUUwvfc • w » W 


SEQ ID NO 2449 1 


NM 

NIVI 


004951 

WW^TwV/ 1 


qpn in NO 1 106 

OLW IL/ INU I IUU 


Contia50288 RC 


SEQ ID NO 2450 


NM 

1 NIVI 


004968 


<%f=o in NO 1107 

OCU IL/ INJW 1 IV// 


Contia50297 RC 

Uul IUy wV/a-w ■ 1 »w 


SEQ ID NO 2451 


NM 

1 N IVI 


004994 

V/ \J~ w \J~ 


qpo in NO 1108 

OuU I L-/ INV-/ 1 1 V/O 


Contia50391 RC 


SEQ ID NO 2452 


NM 

N IVI 


004999 

\J\J~ W W W 


cpn in NO 110Q 

OlU IL/ INV-/ I I V/w 


Cnntia50410 


SEQ ID NO 2453 


NM 


005001 

\J \J w w V_/ 1 


cpn m NO 1110 

OuU IL/ INV 1 1 IVI 


Contia50523 RC 

V^Ui IUUvwwa>w_I XV^ 


SEQ ID NO 2454 


NM 

NIVI 


005002 


cpn m no 1111 

OuVx IL/ INV-/ 1 1 1 1 


Contia50529 


SEQ ID NO 2455 


NM 

NIVI 


005012 


cpn m NO 1112 

OuW IL/ IN V-/ I I I 


Contia50588 RC 


SEQ ID NO 2456 


NM 

I NIVI 


\J\JyJ\J\j£m 


qpn m NO 1113 

OC\x IL/ INV-/ I 1 IO 


Contia50592 


SEQ ID NO 2457 


NM 


005044 


opn in mo 1114 

OuU IL/ INV./ I I l*+ 


Cnntin'SOfifiQ RC 


SEQ ID NO 2458 


NM 

NlVl 




oca in MO 111*^ 
OlU IL/ InvJ I I IU 


Pnntin^071Q RC 
uUl iiiyju / 15 r\v/ 


SEQ ID NO 2460 


NM 

IN IVI 


00*S04Q 


opn in mo 111R 

OlW IL/ INU I I ID 


^0^^^10798 RC 


SEQ ID NO 2461 


NM 

NIVI 


005067 


oca m MO 1117 
otU IL/ INU I I I I 


Pnntin^07^1 RC 
uuiiuyov/f o i rv v> 


SEQ ID NO 2462 

\— / L.VK 11^ I^IV/ fc» TWA— 


NM 

NIVI 


005077 


oca m MO 111ft 

OCU IL/ INU I I IO 


PnntinR0809 RC 

vUl IUywwV/w^ 1 \w 


SEQ ID NO 2463 


NM 

NIVI 


005080 

UJV/ www \J 


qpn m MO 111Q 

OlU IL/ INV-/ 1 1 1 S7 


Contia 50822 RC 

wUI IllUvl/l^fcfc 1 xv 


SEQ ID NO 2464 


NM 

INIVI 


005084 


qpA m NO 1120 

OCU IL/ INV-/ 1 \ £m\J 


Cnntin50850 RC 


SEQ ID NO 2466 


NM 

INIVI 


005130 


qpn in NO 1122 

OCW IL/ INU I 1 C£- 


Contia 50860 RC 

V-/wl lilUUUUvv 1 \w 


SEQ ID NO 2467 


NM 

NIVI 


005139 

V/ww 1 w W 


oca m MO 119*3 

OCU IL/ INU I IZO 


PnntinflOQIS RC 

vUl Hiy \J\J%3 1 v/ 1 \v/ 


SEQ ID NO 2468 


NM 

NIVI 


005168 


qpA m MO 119R 
OlU IL/ INU I I ^w 


Cnntln^OQSO RC 

V-/V/I liiywV/wwV/ l xw 


SEQ ID NO 2469 

\J fa— 1 fa/ 1 » V/ • Vp^ 


NM 

INIVI 


005190 

vUw 1 ww 


qpA m MO 119fi 

OtZU IL/ INU I 


CnntinS1066 RC 


SEQ ID NO 2470 

VL»M( 1 W • » V/ • * w 


NM 

INIVI 


005196 

Www 1 ww 


oca m MO 1197 
OlU IL/ INU I l^# 


Pnntin^1105 RC 
uUi my w i i v/w t xv-/ 


SEQ ID NO 2472 


NM 

INIVI 


005213 

vUv&> 1 w 


°.FO in NO 1128 

OlU IL/ INU 1 1 £-\J 


Contia51 117 RC 

V-/V/I IIIUU 1 1 1 f 1 W 


SEQ ID NO 2473 


NM 

INIVI 


005218 


qpn m NO 1 129 

OLW IL/ INV-/ 1 1 


Contia51196 RC 

vwi niy w i i ww i »w 


SEQ ID NO 2474 


NM 

IN IVI 




<^ c o in no n?o 

OlU IL/ INU 1 lOv 


Contin51235 RC 

vvl iuy w 1 fawv I XV-/ 


SEQ ID NO 2475 


MM 
IN IVI 




oca m MO 11*31 

OlU IL/ INU I IO 1 


PnntinR19S4 RC 


SEQ ID NO 2476 


NM 


005249 


oca in MO 11^9 

OlU IU INU I I OZ 


Pnntin^l ^^9 RC 
UUI illy v/ 1 oj^ rxu 


SEQ ID NO 2477 


NM 


005257 


OCA I r\ MA 4 

obU IU INU lloo 


fVmtin^l *3fxQ RP 


SEQ ID NO 2478 

OLVjC 1 U/ INV/ fc" I V/ 


NM 


005264 


SEO ID NO 1134 


Contig51392_RC 


SEQ ID NO 2479 


NM 


005271 


SEQ ID NO 1135 


Contig51403_RC 


SEQ ID NO 2480 


NM 


005314 


SEQ ID NO 1136 


Contig51685_RC 


SEQ ID NO 2483 


NM 


005321 


SEQ ID NO 1137 


Contig51726_RC 


SEQ ID NO 2484 


NM 


005322 


SEQ ID NO 1138 


Contig51742_RC 


SEQ ID NO 2485 
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SEQ ID NO 


GenBank 


SEQ ID NO 


Accession Number 








NM. 


_005325 


OCU IU l\\J 1 lot? 


Printing 1*7 AG DP 


CCO in MO O/IQR 
otU IU NU 24oO 


NM 


_005326 


ccn in MO 1140 


ooniiyo 1 1 f o r\L/ 


QCO in MO 
OCU IU NU Z40/ 


NM. 


_005335 


CCD in MO 11/11 


ouniiyo iouu 


qco m Mn OA QQ 
OtU IU NU 24oo 


NM. 


_005337 


ccn in NO. 11A0 


uoniiyo iouy ku 


qco in Mn o/iqo i 
otU IU NU 24oy 


NM. 


_005342 


ccn in mo ha^ 

OlU I U l\\J 1 1 *+0 


Oontio^l ft91 DP 
uonuyo lo^ I r\0 


Qpn in Mn o/ion 
otU IU NU 24yu 


NM. 


_005345 


cpn in mo 1 * a a 


Pv\rvHn*\ 1 AAA DP 


oca in Mn 0/10*1 
otU IU NU 24yl 


NM. 


_005357 


CCA in MO 11A^ 
OCU IU NU I I40 


Pr*n+inf^1 Qfx^ DP 


oca in Kin O/i no 
otU IU NU ^4yo 


NM. 


_005375 


CCA in MO 11/Lfi 
OEU IU NU I I40 


Pr*nfirt^1 Oft7 DP 

uoniigo iyo#^_KU 


ccn in Mn o/inc 
otU IU nu ^4yo 


NM. 


_005391 


CCn m MO 1 1/17 
OCU iL/ l\U 1 14/ 


P/M-»f?n^*f QQI DP 

uoniigo iyo i^ku 


ccn m Mn o/nc 
otU IU NU 24yb 


NM. 


_005408 


cpn in MO 1 1 Aft 

OCU IL/ NU I IH-O 


uoniig o i yy4_KO 


ccn in Mn o/i n7 
otU IU NU £A\d( 


NM. 


_005409 


cpn m MO 11AQ 


nnnfirtWHftO DP 


ccn in Mn ovino 
ohU IU NU ^4yo 


NM. 


_005410 


ocn in mo ncn 
OtU IU NU 110U 


uontigo2uy4_ ko 


ccn ir\ ma o yi f\c\ 
ocU IU NU 2499 


NM. 


_005426 


OCA m MH 

otu iu i\u noi 


uontigo2o2U 


CCA in MO nrrVA 

SEQ ID NO 2500 


NM. 


005433 


ocn m Kin titzo 
ocU IU NU 1 102 


uontigo2oyo_KU 


OC/"^ in MA OCAnl 

SEQ ID NO 2501 


NM. 


J)05441 


CCA in MA *f *l CO 

otU IU INU llOo 


uontigo242o_RC 


SEQ ID NO 2503 


NM. 


_005443 


CCA in MA 'iASZA 

otU IU NU 1 104 


LfOntigo24o2_KC 


SEQ ID NO 2504 


NM. 


_005483 


oca in MA *i <i 
otU 1 U NU 1 1 00 


uoruigo204o_Ku 


oca ir\ ma oir/^kc* 

SEQ ID NO 2505 


NM. 


_005486 


CCA in MA 4 

OtU IU NU 11 00 


uontigo200o__KU 


o C/"*\ ir\ KIA OCrtO 

SEQ ID NO 2506 


NM_ 


.005496 


CCA m MA 1 4 *^"7 

otu iu imu no/ 


uontigo20f y_Ko 


O C/"V 1P\ KIA OCA7 

SEQ ID NO 2507 


NM_ 


_005498 


cpn in mo -m^q 
otU IU NU 11 Do 


nnn+in^OCnQ DA 

uontig 02ouo — ko 


OC/*"\ 1 A MA OCAO 

oEQ ID NO 2508 


NM_ 


.005499 


Cpn m MO 1 1 

ocu iu nu iioy 


uoniigo20oy__KLr 


ccn i a ma oenn 
ocQ ID NO 2509 


NM_ 


.005514 


cpn m mo 11 fin 

OCU IU NU I IOU 


uoniigo^o4 1 _Ku 


ccn in ma oc-irt 
ohU IU NU 2510 


NM_ 


.005531 


cpn m MO 11ftO 
OCU IU NU IIOZ 


uontig 02004 


CCA IA KIA A 

ohU IU NO 251 1 


NM_ 


.005538 


cpn m MO H«Q 
OCU IU NU I I DO 


nAntrrtR07nR DP 


ccn in Kin ocho 
ocu IU NU 2512 


NM_ 


005541 


^ c o in mo iifiii 

OCU IU INU I ID*f 




ccn in Kin oc-io 
OtU IU NU 2o1o 


NM_ 


005544 


cpn m mo 'i'iRK 

OCU IU NU I I DO 


uoniigo2 / 22__ KLr 


ccn in Kin och >i 
ocu ID NO 2514 


NM_ 


005548 


ocn m MO 11£ft 
OCU IU NU I IDD 


uoniigoz / zo_KLr 


ccn ia Kin oc«<i: 
ocU IU NO 2515 


NM_ 


005554 


cpn m MO 11ft"7 
OCU IU NU I I Of 


Pnn+lnR07AA DA 


ccn in Kin oc-ic 
ocu IU NU 2516 


NM_ 


005555 


cpn m MO HfiA 
OCU IU NU I I DO 


PrM-ifin^07"7Q DP 

uonugoz/ #y_KU 


ocn in Kin oc*i7 
obU IU NU 251/ 


NM_ 


005556 


cpn m MO 1 1«Q 

ocu iu nu i i oy 


uoniigozyo * _ko 


ccn in Kin och o 
ocu IU NU 2518 


NM_ 


005557 


cpn m MO 1 1"7H 
OCU IU NU I I / U 


uonugo2yy4 — ko 


ccn in Kin oc^tn 
obU IU NU 2519 


NM_ 


005558 


cpn m mo 4 
OCU IU NU 1 1 ( 1 


AonflAKOnOO DA 


OCA 1 A KIA ACOA 

SEQ ID NO 2520 


NM_ 


005562 


5SFO m MO 1 179 


Pnntin^^n^A DP 


oca m Kin oro*! 1 
OCU IU NU 2021 


NM_ 


005563 


SEQ ID NO 1173 


Contig53047_RC 


SEQ ID NO 2522 


NM_ 


005565 


SEQ ID NO 1174 


Contig53130 


SEQ ID NO 2523 


NM_ 


005566 


SEQ ID NO 1175 


Contig53183_RC 


SEQ ID NO 2524 


NM 


005572 


SEQ ID NO 1176 


Contig53242_RC 


SEQ ID NO 2526 
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GenBank 


SEQ ID NO 


GenBank 


otU IU Nv«J 


Accession Number 




Accession Number 




NM 


005582 


SEQ ID NO 1177 


Contig53248_RC 


SEQ ID NO 2527 


NM 


005608 


SEQ ID NO 1178 


Contig53260_RC 


SEQ ID NO 2528 


NM 


005614 


SEQ ID NO 1179 


Contig53296 RC 


SEQ ID NO 2531 


NM 


005617 


SEQ ID NO 1180 


Contig53307_RC 


SEQ ID NO 2532 


NM 


005620 


SEQ ID NO 1181 


Contig53314_RC 


SEQ ID NO 2533 


NM 


005625 


SEQ ID NO 1182 


Contig53401_RC 


SEQ ID NO 2534 


NM 


005651 


SEQ ID NO 1183 


Contig53550_RC 


SEQ ID NO 2535 


NM_ 


005658 


SEQ ID NO 1184 


Contig53551_RC 


SEQ ID NO 2536 


NM_ 


005659 


SEQ ID NO 1185 


Contig53598_RC 


SEQ ID NO 2537 


NM 


005667 


SEQ ID NO 1186 


Contig53646_RC 


SEQ ID NO 2538 


NM_ 


005686 


SEQ ID NO 1187 


Contig53658_RC 


SEQ ID NO 2539 


NM 


005690 


SEQ ID NO 1188 


Contig53698_RC 


SEQ ID NO 2540 


NM 


005720 


SEQ ID NO 1190 


Contig53719_RC 


SEQ ID NO 2541 


NM 


005727 


SEQ ID NO 1191 


Contig53742_RC 


SEQ ID NO 2542 


NM 


005733 


SEQ ID NO 1192 


Contig53757_RC 


SEQ ID NO 2543 


NM 


005737 


SEQ ID NO 1193 


Contig53870_RC 


SEQ ID NO 2544 


NM 


005742 


SEQ ID NO 1194 


Contig53952_RC 


SEQ ID NO 2546 


NM 


005746 


SEQ ID NO 1195 


Contig53962_RC 


SEQ ID NO 2547 


NM 


005749 


SEQ ID NO 1196 


Contig53968_RC 


SEQ ID NO 2548 


NM 


005760 


SEQ ID NO 1197 


Contig54113_RC 


SEQ ID NO 2549 


NM 


005764 


SEQ ID NO 1198 


Contig54142_RC 


SEQ ID NO 2550 j 


NM 


005794 


SEQ ID NO 1199 


Contig54232_RC 


SEQ ID NO 2551 


NM 


005796 


SEQ ID NO 1200 


Contig54242_RC 


SEQ ID NO 2552 


NM 


005804 


SEQ ID NO 1201 


Contig54260_RC 


SEQ ID NO 2553 


NM 


005813 


SEQ ID NO 1202 


Contig54263_RC 


SEQ ID NO 2554 


NM 


005824 


SEQ ID NO 1203 


Contig54295_RC 


SEQ ID NO 2555 


NM 


005825 


SEQ ID NO 1204 


Contig54318_RC 


SEQ ID NO 2556 


NM 


005849 


SEQ ID NO 1205 


Contig54325_RC 


SEQ ID NO 2557 


NM 


005853 


SEQ ID NO 1206 


Contig54389_RC 


SEQ ID NO 2558 


NM 


005855 


SEQ ID NO 1207 


Contig54394_RC 


SEQ ID NO 2559 


NM 


005864 


SEQ ID NO 1208 


Contiq54414 RC 


SEQ ID NO 2560 


NM 


005874 


SEQ ID NO 1209 


Contiq54425 


SEQ ID NO 2561 


NM 


005876 


SEQ ID NO 1210 


Contig54477_RC 


SEQ ID NO 2562 


NM 


005880 


SEQ ID NO 1211 


Contig54503_RC 


SEQ ID NO 2563 


NM 


005891 


SEQ ID NO 1212 


Contig54534_RC 


SEQ ID NO 2564 


NM 


005892 


SEQ ID NO 1213 


Contig54560_RC 


SEQ ID NO 2566 


NM 


005899 


SEQ ID NO 1214 


Contig54581_RC 


SEQ ID NO 2567 
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« « n « — i , pert i r\ k 

GenBank SEQ ID NO 
Accession Number i 


oermanK otu iu in<j 
Accession Number 


NM_00591 5 SEQ ID NO 1 21 5 


Contig54609_RC SEQ ID NO 2568 


NM_00591 9 SEQ ID NO 1 21 6 


Contig54666_RC SEQ ID NO 2569 


NM_005923 SEQ ID NO 1217 


Contig54667_RC SEQ ID NO 2570 


NM_005928 SEQ ID NO 1218 


Contig54726_RC SEQ ID NO 2571 


NM 005932 SEQ ID NO 1219 


Contig54742 RC SEQ ID NO 2572 

w — 


NM_005935 SEQ ID NO 1220 


Contig54745_RC SEQ ID NO 2573 


NM_005945 SEQ ID NO 1221 


Contig54757_RC SEQ ID NO 2574 


NM_005953 SEQ ID NO 1222 


Contig54761 RC SEQ ID NO 2575 


NM_005978 SEQ ID NO 1223 


Contig54813_RC SEQ ID NO 2576 


NM_005990 SEQ ID NO 1224 


Contig54867_RC SEQ ID NO 2577 


NM_006002 SEQ ID NO 1225 


Contig54895_RC SEQ ID NO 2578 


NM_006004 SEQ ID NO 1226 


Contig54898_RC SEQ ID NO 2579 


NM 006005 SEQ ID NO 1227 


Contig5491 3_RC SEQ ID NO 2580 


NM_006006 SEQ ID NO 1228 


Contig54965_RC SEQ ID NO 2582 


NM_006017 SEQ ID NO 1229 


Contig54968_RC SEQ ID NO 2583 


NM_006018 SEQ ID NO 1230 


Contig55069_RC SEQ ID NO 2584 


NM 006023 SEQ ID NO 1231 


Contig55181_RC SEQ ID NO 2585 


NM_006027 SEQ ID NO 1232 


Contig55188_RC SEQ ID NO 2586 


NM_006029 SEQ ID NO 1233 


Contig55221_RC SEQ ID NO 2587 


NM_006033 SEQ ID NO 1234 


Contig55254_RC SEQ ID NO 2588 


NM_006051 SEQ ID NO 1235 


Contig55265_RC SEQ ID NO 2589 


NM_006055 SEQ ID NO 1236 


Contig55377_RC SEQ ID NO 2591 


NM_006074 SEQ ID NO 1237 


Contig55397_RC SEQ ID NO 2592 


NM_006086 SEQ ID NO 1238 


Contig55448_RC SEQ ID NO 2593 


NM_006087 SEQ ID NO 1239 


Contig55468_RC SEQ ID NO 2594 


NM_006096 SEQ ID NO 1240 


Contig55500_RC SEQ ID NO 2595 


NM_006101 SEQ ID NO 1241 


Contig55538_RC SEQ ID NO 2596 


NM_006103 SEQ ID NO 1242 


Contig55558_RC SEQ ID NO 2597 


NM_0061 1 1 SEQ ID NO 1243 


Contig55606_RC SEQ ID NO 2598 


NM_0061 1 3 SEQ ID NO 1 244 


Contig55674_RC SEQ ID NO 2599 


NM_0061 1 5 SEQ ID NO 1 245 


Contig55725_RC SEQ ID NO 2600 


NM_0061 17 SEQ ID NO 1 246 


Contig55728_RC SEQ ID NO 2601 


NM_006142 SEQ ID NO 1 247 


Contig55756_RC SEQ ID NO 2602 


NM__006144 SEQ ID NO 1248 


Contig55769_RC SEQ ID NO 2603 


NM__006148 SEQ ID NO 1249 


Contig55771_RC SEQ ID NO 2605 


NM_0061 53 SEQ ID NO 1 250 


Contig5581 3_RC SEQ ID NO 2607 


NM 006159 SEQ ID NO 1251 


Contig55829_RC SEQ ID NO 2608 
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GenBank 

Accession Number 


SEQ ID NO 


GenBank 

Accession Number 


obU IU NU 


NM 


006170 


SEQ ID NO 1252 


Contig55852_RC 


SEQ ID NO 2609 


NM 


006197 


SEQ ID NO 1253 


Contig55883_RC 


SEQ ID NO 2610 


NM 


006224 


SEQ ID NO 1255 


Contig55920_RC 


SEQ ID NO 261 1 


NM 


006227 


SEQ ID NO 1256 


Contig55940_RC 


SEQ ID NO 2612 


NM 


006235 


SEQ ID NO 1257 


Contig55950_RC 


SEQ ID NO 2613 


NM 


006243 


SEQ ID NO 1258 


Contig55991_RC 


SEQ ID NO 2614 


NM 


006264 


SEQ ID NO 1259 


Contig55997_RC 


SEQ ID NO 2615 


NM 


006271 


SEQ ID NO 1261 


Contig56023_RC 


SEQ ID NO 2616 


NM 


006274 


SEQ ID NO 1262 


Contig56030_RC 


SEQ ID NO 2617 


NM 


006290 


SEQ ID NO 1265 


Contig56093_RC 


SEQ ID NO 2618 


NM 


006291 


SEQ ID NO 1266 


Contig56205_RC 


SEQ ID NO 2621 


NM 


006296 


SEQ ID NO 1267 


Contig56270_RC 


SEQ ID NO 2622 


NM 


006304 


SEQ ID NO 1268 


Contig56276_RC 


SEQ ID NO 2623 


NM 


006314 


SEQ ID NO 1269 


Contiq56291 RC 


SEQ ID NO 2624 


NM 


006332 


SEQ ID NO 1270 


Contig56298_RC 


SEQ ID NO 2625 


NM 


006357 


SEQ ID NO 1271 


Contig56307 


SEQ ID NO 2627 


NM 


006366 


SEQ ID NO 1272 


Contig56390_RC 


SEQ ID NO 2628 


NM 


006372 


SEQ ID NO 1273 


Contig56434_RC 


SEQ ID NO 2629 


NM 


006377 


SEQ ID NO 1274 


Contig56457_RC 


SEQ ID NO 2630 


NM 


006378 


SEQ ID NO 1275 


Contig56534_RC 


SEQ ID NO 2631 


NM 


006383 


SEQ ID NO 1276 


Contig56670_RC 


SEQ ID NO 2632 


NM 


006389 


SEQ ID NO 1277 


Contig56678_RC 


SEQ ID NO 2633 


NM 


006393 


SEQ ID NO 1278 


Contlg56742_RC 


SEQ ID NO 2634 


NM 


006398 


SEQ ID NO 1279 


Contig56759_RC 


SEQ ID NO 2635 


NM 


006406 


SEQ ID NO 1280 


Contig56765_RC 


SEQ ID NO 2636 


NM 


006408 


SEQ ID NO 1281 


Contig56843_RC 


SEQ ID NO 2637 


NM 


006410 


SEQ ID NO 1282 


Contig57011_RC 


SEQ ID NO 2638 


NM 


006414 


SEQ ID NO 1283 


Contig57023_RC 


SEQ ID NO 2639 


NM 


006417 


SEQ ID NO 1284 


Contig57057_RC 


SEQ ID NO 2640 


NM 


006430 


SEQ ID NO 1285 

UU\K 1 L«/ 1 >1 V./ 1 A«w>W 


Contio.57076 RC 


SEQ ID NO 2641 


NM 


006460 


SEQ ID NO 1286 


Contia57081 RC 


SEQ ID NO 2642 


NM 


006461 


SEQ ID NO 1287 


Contiq57091 RC 


SEQ ID NO 2643 


NM 


006469 


SEQ ID NO 1288 


Contig57138_RC 


SEQ ID NO 2644 


NM 


006470 


SEQ ID NO 1289 


Contig57173_RC 


SEQ ID NO 2645 


NM 


006491 


SEQ ID NO 1290 


Contig57230_RC 


SEQ ID NO 2646 


NM 


006495 


SEQ ID NO 1291 


Contig57258_RC 


SEQ ID NO 2647 


NM 


006500 


SEQ ID NO 1292 


Contig57270_RC 


SEQ ID NO 2648 
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15 



25 
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GenBank 


SEQ ID NO 


GenBank 




Accession Number 




Accession Number 




NM 


006509 


SEQ ID NO 1293 


Contig57272_RC 


SEQ ID NO 2649 


NM 


006516 


SEQ ID NO 1294 


Contig57344_RC 


SEQ ID NO 2650 j 


NM 


006533 


SEQ ID NO 1295 


Contig57430_RC 


SEQ ID NO 2651 


NM 


006551 


SEQ ID NO 1296 


Contig57458_RC 


SEQ ID NO 2652 


NM 


006556 


SEQ ID NO 1297 


Contig57493_RC 


SEQ ID NO 2653 


NM 


006558 


SEQ ID NO 1298 


Contig57584_RC 


SEQ ID NO 2654 


NM 


006564 


SEQ ID NO 1299 


Contig57595 


SEQ ID NO 2655 


NM 


006573 


SEQ ID NO 1300 


Contig57602_RC 


SEQ ID NO 2656 j 


NM 


006607 


SEQ ID NO 1301 


Contig57609_RC 


SEQ ID NO 2657 ! 


NM 


006622 


SEQ ID NO 1302 


Contig57610_RC 


SEQ ID NO 2658 


NM 


006623 


SEQ ID NO 1303 


Contig57644_RC 


SEQ ID NO 2659 


NM 


006636 


SEQ ID NO 1304 


Contig57725_RC 


SEQ ID NO 2660 


NM 


006670 


SEQ ID NO 1305 


Contig57739_RC 


SEQ ID NO 2661 


NM 


006681 


SEQ ID NO 1306 


Contig57825_RC 


SEQ ID NO 2662 


NM 


006682 


SEQ ID NO 1307 


Contig57864_RC 


SEQ ID NO 2663 


NM 


006696 


SEQ ID NO 1308 

WU>^( Ik/ 1 ^ 1 WWW 


Contia57940 RC 


SEQ ID NO 2664 


NM 


006698 


SEQ ID NO 1309 


Contig58260_RC 


SEQ ID NO 2665 


NM_ 


.006705 


SEQ ID NO 1310 


Cont'«g58272_RC 


SEQ ID NO 2666 


NM 


006739 


SEQ ID NO 1311 


Contig58301_RC 


SEQ ID NO 2667 


NM 


006748 


SEQ ID NO 1312 


Contig58368_RC 


SEQ ID NO 2668 


NM_ 


_006759 


SEQ ID NO 1313 


Contig58471_RC 


SEQ ID NO 2669 


NM. 


_006762 


SEQ ID NO 1314 


Contig58755_RC 


SEQ ID NO 2671 


NM. 


_006763 


SEQ ID NO 1315 


Contig59120_RC 


SEQ ID NO 2672 


NM 


006769 


SEQ ID NO 1316 


Contig60157_RC 


SEQ ID NO 2673 


NM 


006770 


SEQ ID NO 1317 


Contig60864_RC 


SEQ ID NO 2676 


NM 


006780 


SEQ ID NO 1318 


Contig61254_RC 


SEQ ID NO 2677 


NM 


006787 


SEQ ID NO 1319 


Contig61815 


SEQ ID NO 2678 


NM. 


_006806 


SEQ ID NO 1320 


Contig61975 


SEQ ID NO 2679 


NM 


006813 


SEQ ID NO 1321 


Contig62306 


SEQ ID NO 2680 j 


NM. 


_006825 


SEQ ID NO 1322 


Contig62568_RC 


SEQ ID NO 2681 


NM. 


_006826 


SEQ ID NO 1323 


Contig62922_RC 


SEQ ID NO 2682 


NM 


006829 


SEQ ID NO 1324 


Contiq62964 RC 


SEQ ID NO 2683 


NM 


006834 


SEQ ID NO 1325 


Contig63520_RC 


SEQ ID NO 2685 


NM. 


_006835 


SEQ ID NO 1326 


Contig63649_RC 


SEQ ID NO 2686 


NM 


006840 


SEQ ID NO 1327 


Contig63683_RC 


SEQ ID NO 2687 


NM. 


_006845 


SEQ ID NO 1328 


Contig63748_RC 


SEQ ID NO 2688 


NM 


006847 


SEQ ID NO 1329 


Contig64502 


SEQ ID NO 2689 
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5 



GenBank 


SEQ ID NO 


GenBank 


obU iu v*yj 


Accession Number 




Accession Number 




NM 


006851 


SEQ ID NO 1330 


Contig64688 


SEQ ID NO 2690 


NM 


006855 


SEQ ID NO 1331 


Contig64775_RC 


SEQ ID NO 2691 


NM_ 


.006864 


SEQ ID NO 1332 


Contig65227 


SEQ ID NO 2692 


NM 


006868 


SEQ ID NO 1333 


Contig65663 


SEQ ID NO 2693 


NM 


006875 


SEQ ID NO 1334 


Contig65785_RC 


SEQ ID NO 2694 


NM 


006889 


SEQ ID NO 1336 


Contig65900 


SEQ ID NO 2695 


NM. 


_006892 


SEQ ID NO 1337 


Contig66219_RC 


SEQ ID NO 2696 


NM 


006912 


SEQ ID NO 1338 


Contig66705_RC 


SEQ ID NO 2697 


NM. 


_006931 


SEQ ID NO 1341 


Contig66759_RC 


SEQ ID NO 2698 


NM 


006941 


SEQ ID NO 1342 


Contig67182_RC 


SEQ ID NO 2699 


NM 


006943 


SEQ ID NO 1343 





15 



20 



25 



30 



35 
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5 



15 



20 



25 



30 



Identifier 


Correlation 


Name 


Description 


NM_002051 


0.763977 


GATA3 


GATA-binding protein 3 


AB020689 


0.753592 


KIAA0882 


KIAA0882 protein 


NM_001218 


0.753225 


CA12 


carbonic anhydrase XII 


NM_000125 


0.748421 


ESR1 


estrogen receptor 1 


Contig56678_RC 


0.747816 




ESTs 


NM_004496 


0.729116 


HNF3A 


hepatocyte nuclear factor 3, alpha 


NM_017732 


0.713398 


FLJ20262 


hypothetical protein FLJ20262 


NM_006806 


-0.712678 


BTG3 


BTG family, member 3 


Contig56390_RC 


0.705940 




ESTs 


Contig37571_RC 


0.704468 




ESTs 


NM_004559 


-0.701617 


NSEP1 


nuclease sensitive element binding 
protein 1 


Contig50153_RC 


-0.696652 




ESTs, Weakly similar to LKHU 
proteoglycan link protein precursor 
[n. sapiens] 


NM_012155 


0.694332 


EMAP-2 


microtubule-associated protein like 




U.DOl *tO\J 


PI \0i"iO7 
rLJi 1 1 1 


nypoineiiccii protein rLJZinz# 


NM_019063 


-0.686064 


C20RF2 


chromosome 2 open reading frame 

o 


mm 01 221 Q 






uiuouit? i\no LiiiuuycntJ nurnoioy 


NM_001982 


0.676114 


ERBB3 


v-erb-b2 avian erythroblastic 
leuKernia virai oncogene nomoiog o 


NM_006623 


-0.675090 


PHGDH 


phosphoglycerate dehydrogenase 


NM_000636 


-0.674282 


SOD2 


superoxide dismutase 2, 
miiocnonanai 


NM_006017 


-0.670353 


PROML1 


prominin (mouse)-like 1 


Contig57940_RC 


0.667915 


MAP-1 


MAP-1 protein 


Contig46934_RC 


0.666908 




ESTs, Weakly similar to JE0350 
Anterior gradient-2 [H.sapiens] 


NM_005080 


0.665772 


XBP1 


X-box binding protein 1 


NM_014246 


0.665725 


CELSR1 


cadherin, EGF LAG seven-pass G- 
type receptor 1 , flamingo 
(Drosophila) homolog 



35 
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dentifier 


Correlation 


Name 


Description 


5 


Contig54667_RC 


-0.663727 




Human DNA sequence from clone 
RP1-187J11 on chromosome 
6q1 1 .1-22.33. Contains the gene for 
a novel protein similar to S. pombe 
and S. cerevisiae predicted proteins, 
the gene for a novel protein similar 
to protein kinase C inhibitors, the 3' 
end of the gene for a novel protein 
similar 10 L^rosopniia lo^ emu 
predicted worm proteins, ESTs, 
STSs, GSSs and two putative CpG 
islands 


10 


Contig51994_RC 


0.663715 




ESTs, Weakly similar to B0416.1 
[C.elegans] 




NM 016337 


0.663006 


RNB6 


RNB6 




NM 015640 


-0.660165 


PAI-RBP1 


PAI-1 mRNA-binding protein 




X07834 


-0.657798 


SOD2 


superoxide dismutase 2, 
mitochondrial 


15 


NM 012319 


0.657666 


LIV-1 


LIV-1 protein, estrogen regulated 




Contig41887_RC 


0.656042 




ESTs, Weakly similar to Homolog of 
rat Zymogen granule membrane 
protein [H. sapiens] 




NM_003462 


0.655349 


P28 


dynein, axonemal, light intermediate 
polypeptide 


20 


Contia58301 RC 


0.654268 




Homo sapiens mRNA; cDNA 
DKFZp667D095 (from clone 
DKFZp667D095) 




NM_005375 


0.653783 


MYB 


v-myb avian myeloblastosis viral 
oncogene homolog 




NM 017447 


-0.652445 


YG81 


hypothetical protein LOC54149 




Contig924_RC 


-0.650658 




ESTs 


M55914 


-0.650181 


MPB1 


MYC promoter-binding protein 1 




NM_006004 


-0.649819 


UQCRH 


ubiquinol-cytochrome c reductase 
hinnp n rote in 








RARA 


retinoic acid receotor aloha 








H^l J7Q303 


nrotpin nredicted bv clone 23882 


30 




_n fi4.74.03 




nentidvl arainine deiminase. tvoe II 






_n fi4«4i2 


LOC51323 


hvnothetical orotein 




K02403 


0.645532 


C4A 


complement component 4A 




NM 016405 


-0.642201 


HSU93243 


Ubc6p homolog 




Contig46597 RC 


| 0.641733 




ESTs 




Contig55377 RC 


0.640310 




ESTs 


35 


NM 001207 


0.637800 


BTF3 


basic transcription factor 3 
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10 



15 



20 



25 



30 
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Identifier 


Correlation 


Name 


Description 


NIVI u I O 1 Uw 


0 636422 


FLJ 10647 


hvDothetical protein FLJ10647 


AL1 10202 


-0.635398 




Homo sapiens mRNA; cDNA 
DKFZd586 12022 ffrom clone 
DKFZp586l2022) 


AL133105 


-0.635201 


DKFZp434F 
2322 


hypothetical protein DKFZp434F2322 




-U.OOU 1 Oi7 


rxDIVIO I 


DMA Kinriinn mntrf ^inalf* strandprf 

r\i Tir\ uii iuii ly ilium, on lyic oligiiu^vj 

interacting protein 1 


out luy %jo iou 


-0 634812 




ESTs Weaklv similar to 
hyperpolarization-activated cyclic 
nucleotide-gated channel hHCN2 
[H.sapiens] 


NM_018014 


-0.634460 


BCL11A 


B-cell CLL/lymphoma 11A (zinc 
finger protein) 


NM_006769 


-0.632197 


LM04 


LIM domain only 4 




0 631170 


JCL-1 


heoatocellular carcinoma associated 
protein; breast cancer associated 
gene 1 


Contig49233_RC 


-0.631047 




Homo sapiens, Similar to nuclear 
receptor binding factor 2, clone 
IMAGE:3463191, mRNA, partial cds 


AL1 33033 


0.629690 


KIAA1025 


KIAA1025 protein 


AL049265 


0.629414 




Homo sapiens mRNA; cDNA 
DKFZp564F053 (from clone 

L/i\rz.puuT , ruuo ) 


NM_018728 


0.627989 


MY05C 


myosin 5C 


NM_004780 


0.627856 


TCEAL1 


transcription elongation factor A 

/Q|l\ lil^o 1 


Contig760_RC 


0.627132 




ESTs 


Contig399_RC 


0.626543 


FLJ12538 


hypothetical protein FLJ12538 
similar to ras-related protein RAB17 


M83822 


0.625092 


CDC4L 


cell division cycle 4-like 


NM_001255 


-0.625089 


CDC20 


CDC20 (cell division cycle 20, S. 
cerevisiae, homolog) 


NM_006739 


-0.624903 


MCM5 


minichromosome maintenance 
deficient (S. cerevisiae) 5 (cell 
division cycle 46) 


NM_002888 


-0.624664 


RARRES1 


retinoic acid receptor responder 
(tazarotene induced) 1 


NM_003197 


0.623850 


TCEB1L 


transcription elongation factor B 
(Sill), polypeptide 1-like 


NM_006787 


0.623625 


JCL-1 


hepatocellular carcinoma associated 
protein; breast cancer associated 
gene 1 


Contig49342_RC 


0.622179 




ESTs 
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dentifier 


Correlation 


Name 


Description 


> 


\L133619 


0.621719 




Homo sapiens mRNA; cDNA 
DKFZp434E2321 (from clone 
DKFZp434E2321); partial cds 


i 


M_1 33622 


0.621577 


KIAA0876 


KIAA0876 protein J 


5 


NM_004648 


-0.621532 


PTPNS1 


protein tyrosine phosphatase, non- 
receptor type substrate 1 




NM_001793 


-0.621530 


CDH3 


cadherin 3, type 1 , P-cadherin 
(placental) 




NMJ303217 


0.620915 


TEGT 


testis enhanced gene transcript 

/DAY inhiKi+r\r i\ 

(daa inniDiior i ; s 


10 


NM_001551 


0.620832 


GBP1 


immunogioDuiin ^ou/y/v uiriuiny 
protein 1 




NM 002539 1 


-0.620683 


0DC1 


ornithine decarboxylase 1 




Contig55997 RC 


-0.619932 




ESTs 




NM 000633 


0.619547 


BCL2 


B-cell CLL/lymphoma 2 




NM 016267 


-0.619096 


TONDU 


TONDU 


15 


Contig3659 RC 


0.618048 


FLJ21174 


hypothetical protein FLJ21174 




NM_000191 


0.617250 


HMGCL 


3-hydroxymethyl-3-methylglutaryl- 
ooenzyme a lyase 
(hydroxymethylglutaricaciduria) 




NM 001267 

N 1 VI \J\J \ *m*\J * 


0.616890 


CHAD 


chondroadherin 




Pnntin^Q090 RC 


0 616385 




ESTs 






-0 616268 


HSSG1 


heat-shock suppressed protein 1 






0 616015 


FLJ21603 


hypothetical protein FLJ21603 




NM 001428 

N 1 VI \J\J l~T£-\J 


-0 615855 


EN01 


enolase 1 , (alpha) 




nnntia51369 RC 


0 615466 




ESTs 




Pnntin^6647 RC 


0 615310 


GFRA1 


GDNF family receptor alpha 1 


25 


NM 014096 


-0 614832 


PR01659 


PR01659 protein 


NM 01^9^7 


0 614735 


LOC51604 


CGI-06 protein 




Contig49790 RC 


-0.614463 




ESTs 




NM_006759 


-0.614279 


UGP2 


UDP-giucose pyropnospnoryiase ^ 




Contig53598 RC 


-0.613787 


FLJ11413 


hypothetical protein FLJ1 1413 




AF1 13132 


-0.613561 


PSA 


phosphoserine aminotransferase 


30 


AK000004 


0.613001 




Homo sapiens mRNA for FLJ00004 
protein, partial cds 




Contig52543_RC 


0.612960 




Homo sapiens cDNA FLJ13945 fis, 
clone Y79AA1 000969 

vlvl Iw ■ I ^fi W % 1 WWW WW 




AB032966 


-0.611917 


KIAA1140 


KIAA1140 protein 


35 


AL080192 


0.611544 




Homo sapiens cDNA: FLJ21238 fis, 
clone COL01115 


X56807 


-0.610654 


DSC2 


desmocollin 2 
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IHorvKfior 
lUtJIUIIItJI 


UUllcldllUIl 


INanlc 


uieocripuon 


Contig30390_RC 


0.609614 




ESTs 


AL1 37362 


0.609121 


FLJ22237 


hypothetical protein FLJ22237 


NM_01421 1 


-0.608585 


GABRP 


gamma-aminobutyric acid (GABA) A 
receptor, pi 


NM_006696 


0.608474 


SMAP 


thyroid hormone receptor 
coactivating protein 


Contig45588_RC 


-0.608273 




Homo sapiens cDNA: FLJ22610 fis, 
cione rioiLKfyou 


NM_003358 


0.608244 


UGCG 


UDP-glucose ceramide 
glucosyltransferase 


NM_006153 


-0.608129 


NCK1 


NCK adaptor protein 1 


NM 001453 


-0 606939 


FOXC1 


fnrkhpari hov 01 

ivi rvi ivuu uvA i 


Contia54666 RC 


0 606475 




ov65e02x1 NCI CGAP CLL1 
Homo sapiens cDNA clone 
IMAGE:1670714 3' similar to 
TR:Q29168 Q29168 UNKNOWN 
PROTEIN mRNA sequence. 


NM_005945 


-0.605945 


MPB1 


MYC promoter-binding protein 1 


Contia55725 RC 


-0 605841 




ESTs Modi^ratelv similar to "PSORA'S 
hypothetical protein 
DKFZp762L031 1.1 [H.sapiens] I 


Contig37015_RC 


-0.605780 




ESTs, Weakly similar to 

I IAQ^ Wl IMAM 1 IRA^M^A 

PROTEIN [H.sapiens] 


A 1 1^7480 




OnODr 1 


ono-uui iidin uuiuiny pruioin i 


NM 00^9 R 

INIVI wUDOjCw* 




n ii j 


n i nistone family, memuer i 


NM_001446 


-0.604061 


FABP7 


fatty acid binding protein 7, brain 


Contig263_RC 


0.603318 




Homo sapiens cDNA: FLJ23000 fis, 
clone LNG001 94 


Contia8347 RC 


-0 60331 1 




ESTs 


NM_002988 


-0.603279 


SCYA18 


small inducible cytokine subfamily A 
(Cys-Cys), member 18, pulmonary 
and antivattnn-rpni i later! 


AF1 11849 


0.603157 


HEL01 


homolog of yeast long chain 
polyunsaturated fatty acid 
elongation enzyme 2 


NM_014700 


0.603042 


KIAA0665 


KIAA0665 gene product 


NM_001814 


-0.602988 


CTSC 


cathepsin C 


AF1 16682 


-0.602350 


PRO2013 


hypothetical protein PRO2013 


AB037836 


0.602024 


KIAA1415 


KIAA1 41 5 protein | 


AB002301 


0.602005 


KIAA0303 


KIAA0303 protein 
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dentifier 
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NM_002996 


-0.601841 


SCYD1 


small inducible cytokine subfamily D 
(Cys-X3-Cys), member 1 
(fractalkine, neurotactin) 


5 


NM_018410 


-0.601765 


DKFZp762 
E1312 


hypothetical protein 
DKFZp762E1312 




Contig49581_RC 


-0.601571 


KIAA1350 


KIAA1350 protein 




NM_003088 


-0.601458 


SNL 


singed (Drosophila)-like (sea urchin 
fascin homolog like) 




Contig47045_RC 


0.601088 




ESTs, Weakly similar to 

DP1 HUMAN POLYPOSIS LOCUS 

PROTEIN 1 [H. sapiens] 


10 


NM_001806 


-0.600954 


CEBPG 


CCAAT/enhancer binding protein 
(C/EBP), gamma 




NM 004374 


0.600766 


COX6C 


cytochrome c oxidase subunit Vic 




Contig52641_RC 


0.600132 




ESTs, Weakly similar to CENB 
MOUSE MAJOR CENTROMERE 
AUTOANTIGEN B [M.musculus] 


15 


NM 000100 


-0.600127 


CSTB 


cystatin B (stefin B) 




NM_002250 


-0.600004 


KCNN4 


potassium intermediate/small 
conductance calcium-activated 
channel, subfamily N, member 4 




AB033035 


-0.599423 


KIAA1209 


KIAA1209 protein 




Contig53968 RC 


0.599077 




ESTs 


20 


NM 002300 


-0.598246 


LDHB 


lactate dehydrogenase B 




NM 000507 


0.598110 


FBP1 


fructose-1 ,6-bisphosphatase 1 




NM_002053 


-0.597756 


GBP1 


guanylate binding protein 1 , 
interferon-inducible, 67kD 




AB007883 


0.597043 


KIAA0423 


KIAA0423 protein 


25 


NM_004900 


-0.597010 


DJ742C19 
.2 


phorbolin (similar to apolipoprotein B 
mRNA editing protein) 




NM_004480 


0.596321 


FUT8 


fucosyltransferase 8 (alpha (1 ,6) 
fucosyltransferase) 




Contig35896 RC 


0.596281 




ESTs 




NM 020974 


0.595173 


CEGP1 


CEGP1 protein 


30 


NM_000662 


0.595114 


NAT1 


N-acetyltransferase 1 (arylamine N- 
acetyltransferase) 




NM 006113 


j 0.595017 


VAV3 


vav 3 oncogene 




NM_014865 


-0.594928 


KIAA0159 


chromosome condensation-related 
SMC-associated protein 1 




Contig55538_RC 


-0.594573 


BA395L14. 
2 


hypothetical protein bA395L14.2 


35 


NM 016056 


0.594084 


LOC51643 


CGI-119 protein 
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Identifier 


Correlation 


Name 


Description 


NM_003579 


-0.594063 


RAD54L 


RAD54 (S.cerevisiae)-like 


NM_014214 


-0.593860 


IMPA2 


inositol(myo)-1 (or 4)- 
monophosphatase 2 


U79293 


0.593793 




Human clone 23948 mRNA 
sequence 


NM_005557 


-0.593746 


KRT16 


keratin 16 (focal non-epidermolytic 
paimopiantar Keratoaerma) 


k|R « Af\A AAA 

NM_002444 


-0.592405 


MSN 


moesin 


NM_003681 


-0.592155 


PDXK 


pyridoxal (pyridoxine, vitamin B6) 
kinase 


NM_006372 


-0.591711 


NSAP1 


NS1 -associated protein 1 


NM_005218 


-0.591192 


DEFB1 


defensin, beta 1 


NM_004642 


-0.591081 


DOC1 


deleted in oral cancer (mouse, 
homolog) 1 


nL I OOu / *r 






clone HEP20959 


M73547 


0.590317 


D5S346 


DNA segment, single copy probe j 
LNS-CAI/LNS-CAII (deleted in 
polyposis 


Contig65663 


0.590312 




ESTs 


AL035297 


-0.589728 




H.sapiens gene from PAC 747L4 


LrOntigoOu29_RC 


O.Oo93oo 




col S 




0.OOOOD2 


CI IOH070 


hypothetical protein 


NM_0 12425 


-0.588804 




Homo sapiens Ras suppressor 
protein 1 (KSU1), mRNA 


NM_020179 


-0.588326 


FN5 


FN5 protein 


Ar 0909 13 


-0.587275 


TMSB10 


. — — * — — * >i 

thymosin, beta 10 


NM_004176 


0.587190 


SREBF1 


sterol regulatory element binding 
transcription factor 1 


NM_016121 


0.586941 


LOC51133 


NY-REN-45 antigen 


NM_U14773 


0.5oOo71 


KIAA0141 


KIAA0141 gene product 


NM__0 19000 


0.586677 


FLJ20152 


hypothetical protein 


NM_016243 


0.585942 


LOC51706 


cytochrome b5 reductase 1 (B5R.1) 


NM_014274 


-0.585815 


ABP/ZF 


Alu-binding protein with zinc finger 
domain 


NM_018379 


0.585497 


FLJ 11280 


hypothetical protein FLJ 11280 


AL1 57431 


-0.585077 


DKF2p762 
A227 


hypothetical protein DKFZp762A227 


D38521 j 


-0.584684 


KIAA0077 


KIAA0077 protein | 


NM_002570 


0.584272 


PACE4 


paired basic amino acid cleaving 
system 4 
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NM 001809 


-0.584252 


CENPA 


centromere protein A (17kD) 


NM 003318 


-0.583556 


TTK 


TTK protein kinase 


NM 014325 


-0.583555 


COROIC 


coronin, actin-binding protein, 1C 


NM_005667 


0.583376 


ZFP103 


zinc finger protein homologous to 
Zfp103 in mouse 


NM 004354 


0.582420 


CCNG2 


cyclin G2 


NM_003670 


0.582235 


mm*, ill ft f J— V 

BHLHB2 


basic helix-loop-helix domain 
containina class B. 2 

wvl Hull III IH> VIMWW J ^™ 


NM 001673 


-0.581902 


ASNS 


asparagine synthetase 


NM 001333 


-0.581402 


CTSL2 


cathepsin L2 


Contig54295 RC 


0.581256 




ESTs 


Contig33998 RC 


0.581018 




ESTs 


NM 006002 


-0.580592 


UCHL3 


ubiquitin carboxyl-terminal esterase 
L3 (ubiquitin thiolesterase) 


NM_015392 


0.580568 


NPDC1 


neural proliferation, differentiation 
and control, 1 


NM_004866 ! 


0.580138 


SCAMP 1 


secretory carrier membrane protein 
1 


Contig50391 RC 


0.580071 




ESTs 


NM 000592 


0.579965 


C4B 


complement component 4B 


Contig50802 RC 


0.579881 




ESTs 


Contig41635 RC 


-0.579468 




ESTs 


NM_006845 


-0.579339 


KNSL6 


kinesin-like 6 (mitotic centromere- 
associated kinesin) 


NM 003720 


-0.579296 


DSCR2 


Down syndrome critical region gene 
2 


NM 000060 


0.578967 


BTD 


biotinidase 


AL050388 


-0 578736 




Homo sapiens mRNA; cDNA 
DKFZp564M2422 (from clone 
DKFZp564M2422); partial cds 


NM_003772 


-0.578395 


JRKL 


jerky (mouse) homolog-like 


NM_014398 


-0.578388 


TSC403 


similar to lysosome-associated 
membrane alvcoorotein 


NM_001280 


0.578213 


CIRBP 


cold inducible RNA-binding protein 


NM 001395 


-0.577369 


DUSP9 


dual specificity phosphatase 9 


NM 016229 


-0.576290 


LOC51700 


cytochrome bo reductase dok.^ 


NM 006096 


-0.575615 


NDRG1 


N-myc downstream regulated 


NM_001552 


0.575438 


IGFBP4 


insulin-like growth factor-binding 
protein 4 


NM 005558 


-0.574818 


LAD1 


ladinin 1 
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Contig54534_RC 


0.574784 




Human glucose transporter 
pseudogene 


Contig1239_RC 


0.573822 




Human Chromosome 16 BAG clone 
CIT987SK-A-362G6 


Contig57173_RC 


0.573807 




Homo sapiens mRNA for KIAA1737 
protein, partial cds 


NM_004414 


-G.573538 


DSCR1 


Down syndrome critical region gene 
1 


NM_021103 


-0.572722 


TMSB10 


thymosin, beta 10 


NM_002350 


-0.571917 


LYN 


v-yes-1 Yamaguchi sarcoma viral 
related oncogene homolog 


Contig51235_RC 


0.571049 




Homo sapiens cDNA: FLJ23388 fis, 
clone HEP17008 


NM 013384 


0.570987 


TMSG1 


tumor metastasis-suppressor 


NM 014399 


0 570936 


NET-6 


tetraspan NET-6 protein 


Contia26022 RC 


-0 570851 




ESTs 


AB023152 


0 570561 


KIAA0935 


KIAA0935 protein 


NM_021077 


-0.569944 


NMB 


neuromedin B 


NM_003498 


-0.569129 


SNN 


stannin 


U 17077 


-0.568979 


BENE 


BENE protein 


D86985 


0.567698 


KIAA0232 


KIAA0232 gene product 


NM_006357 


-0.567513 


UBE2E3 


ubiquitin-conjugating enzyme E2E 3 
(homologous to yeast UBC4/5) 


AL049397 


-0.567434 




Homo sapiens mRNA; cDNA 
DKFZp586C1019 (from clone 
DKFZp586C1019) 


Contig64502 


0.567433 




ESTs, Weakly similar to unknown 
[M.musculus] 


Contig56298_RC 


-0.566892 


FLJ13154 


hypothetical protein FLJ13154 


Contig46056_RC 


0.566634 




ESTs, Weakly similar to 
YZ28JHUMAN HYPOTHETICAL 
PROTEIN ZAP128 [H.sapiens] 


AF007153 


0.566044 




Homo sapiens clone 23736 mRNA 
sequence 


Contig1778_RC 


-0.565789 




ESTs 


NM_017702 


-0.565789 


FLJ20186 


hypothetical protein FLJ20186 


Contig39226_RC 


0.565761 




Homo sapiens cDNA FLJ12187 fis, 
clone MAMMA1 000831 


NM_000168 


0.564879 


GLI3 


GLI-Kruppel family member GLI3 
(Greig cephalopolysyndactyly 
syndrome) 



35 
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dentifier 


Correlation 


Name 


Description 




wrOntigo ( ouy_Ko 






ESTs Weaklv similar to 
T2D3 HUMAN TRANSCRIPTION 
INITIATION FACTOR TFIID 135 
KDA SUBUNIT [H.sapiens] 


5 


J45975 


0.564602 


PIB5PA 


phosphatidylinosito! (4,5) 
bisphosphate 5-phosphatase, A 




^F038182 


0.564596 




Homo sapiens clone 23860 mRNA 
sequence 




Contig5348_RC 


0.564480 




ESTs, Weakly similar to 1607338A 
transcription factor BTF3a 
[n.sapiensj 


10 


NM_001321 


-0.564459 


CSRP2 


cysteine and glycine-rich protein 2 




Contig25362 RC 


-0.563801 




ESTs 




NM_001609 


0.563782 


ACADSB 


acyl-Coenzyme A dehydrogenase, 
short/branched chain 


15 


Contig40146 


0.563731 




wi84e12.xl NCLCGAP_Kid12 
Homo saoiens cDNA clone 
IMAGE:2400046 3' similar to 
SW:RASD DICDI P03967 RAS- 
LIKE PROTEIN RASD ;, mRNA 
sequence. 




NM 016002 


0.563403 


_OC51097 


Ct3i-4y protein 




Contig34303_RC 


0.563157 




Homo sapiens clmna. tlj^ i o i r no, 
clone COL05829 


20 


Contig55883 RC 


0.563141 




ESTs 




NM 017961 


0.562479 


FLJ20813 


hypothetical protein FLJ20813 




M21551 


-0.562340 


NMB 


neuromedin B 


ZD 


Contig3940_RC 


-0.561956 


YWHAH 


tyrosine 3- 

monooxygenase/tryptophan 5- 
monooxygenase activation proiein, 
eta polypeptide 




AB033111 


-0.561746 


KIAA1285 


KIAA1285 protein 




Contig43410 RC 


0.561678 




ESTs 




Contig42006 RC 


-0.561677 




ESTs 




Contig57272_RC 


0.561228 




ESTs 


30 




-U.OD I UOO 


YWHAH 
I vv nnn 


tvro^ine 3- 

monooxygenase/tryptophan 5- 
monooxygenase activation protein, 
eta polypeptide 




NM_005915 


-0.560813 


MCM6 


minichromosome maintenance 
deficient (mis5, S. pombe) 6 ! 




NM 003875 


-0.560668 


GMPS 


guanine monphosphate synthetase 


35 


AK000142 


0.559651 


AK000142 


Homo sapiens cDNA FLJ20135 fis, 
cloneCOL06818. 
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10 



25 
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Name 


Description 


NM_002709 


-0.559621 


PPP1CB 


protein phosphatase 1 , catalytic 
subunit, beta isofomn 


NM_001276 


-0.558868 


CHI3L1 


chitinase 3-like 1 (cartilage 
glycoprotein-39) 


NM_002857 


0.558862 


PXF 


peroxisomal farnesylated protein 


Contig3381 5_RC 


r\ IT IT A A 

-0.558741 


FLJ22833 


hypothetical protein FLJ22833 


NM_003740 


r\ it it n a r\ a 

-0.558491 


KCNK5 


potassium channel, subfamily K, 
member 5 (TASK-2) 


Contig53646_RC 


r\ it it a a it it 

0.558455 




ESTs 


NM_005538 


-0.558350 


INHBC 


mhibin, beta C 


NM_0021 1 1 


0.557860 


HD 


huntingtin (Huntington disease) 


NM_003683 


-0.557807 


D21S2056 
E 


DNA segment on chromosome 21 
(unique) 2056 expressed sequence 


NM_003035 


-0.557380 


SIL 


TAL1 (SCL) interrupting locus 


L/Oniig4ooo ku 


-U.DO/ dt\ 0 




norno sapiens, oimuar 10 mxegrai 
membrane protein 3, clone 
MGC:3011, mRNA, complete cds 


Contig38288_RC 


-0.556426 




ESTs, Weakly similar to ISHUSS 
protein disulfide-isomerase 
[H. sapiens] 


NM_015417 


0.556184 


DKFZP434 
1114 


DKFZP434I1 14 protein 


NM_015507 ; 


-0.556138 


EGFL6 


EGF-like-domain, multiple 6 


AF279865 


0.555951 


KIF13B 


kinesin family member 13B 


Contig31288_RC 


-0.555754 




ESTs 


NM_002966 


-0.555620 


S100A10 


S100 calcium-binding protein A10 
(annexin II ligand, calpactin I, light 
polypeptide (p1 1 )) 


mk « r\ a "i it it 

NM_0 17585 


r\ it r it a ~r 

-0.555476 


SLC2A6 


solute carrier family 2 (facilitated 
alucose transDOrtert member 6 


NM_013296 


-0.555367 


HSU54999 


LGN Drotein 


NM_000224 


0.554838 


KRT18 


keratin 18 


Contig49270_RC 


-0.554593 


Kl AA1 553 


Kl AA1 553 protein 


NM_004848 


-0.554538 


ICB-1 


basement membrane-induced gene 


NM_007275 


0.554278 


FUS1 


lung cancer candidate 


NM_007044 


-0.553550 


KATNA1 


katanin p60 (ATPase-containing) 
subunit A 1 


Contig1829 


0.553317 




ESTs 


AF272357 


0.553286 


NPDC1 


neural proliferation, differentiation 
and control, 1 



35 
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3ontig57584_RC 


-0.553080 




Homo sapiens, Similar to gene rich 
cluster, C8 gene, clone MGC:2577, 
mRNA, complete cds 


5 


MM_003039 


-0.552747 


SLC2A5 


solute carrier family 2 (facilitated 
glucose transporter), member 5 




MM_014216 


0.552321 


ITPK1 


inositol 1 ,3,4-triphosphate 5/6 
kinase 




NMJD07027 


-0.552064 


TOPBP1 


topoisomerase (DNA) II binding 
protein 


10 


AF 118224 


-0.551916 


ST14 


suppression of tumorigenicity 14 
(colon carcinoma, matriptase, 
epithin) 




X75315 


-0.551853 


HSRNASE 

B 


seb4D 




NM_012101 


-0.551 oZ4 


A I DO 


otovio tnlonniortacis firm in D— 

associated protein 




AL1 57482 


-0.551329 


r-i looonn 

-LJ23399 


nypotneucai proxein ri_j^oo»» 


15 


NM_012474 


-0.551150 


UMPK 


uridine monophosphate kinase 




Contig57081 RC 


0.551103 




ESTs 




NM_006941 


-0.551069 


SOX10 


SRY (sex determining region Y)-box 
10 


20 


NM_004694 


0.550932 


SLC16A6 


solute carrier family 16 
(monocarboxylic acid transporters), 
member 6 


Contig9541 RC 


0.550680 




ESTs 




Contig20617_RC 


0.550546 




ESTs 




NM_004252 


0.550365 


SLC9A3R 
1 


solute carrier family 9 
(sodium/hydrogen exchanger), 
isoform 3 regulatory factor 1 


25 


NM_015641 


-0.550200 


DKFZP586 
BZOZZ 


testin 




NM_004336 


-0.550164 


BUdI 


DUQOing uninniDiicu uy 
benzimidazoles 1 (yeast homolog) 




Contig39960_RC 


-0.549951 


FLJ21079 


hypothetical protein FLJ21079 




NM 020686 


0.549659 


NPD009 


NKUuuy proiein 


30 


NM 002633 


-0.549647 


PGM1 


phosphoglucomutase 1 


Contig30480 RC 


0.548932 




ESTs 




NM_003479 


0.548896 


PTP4A2 


protein tyrosine phosphatase type 
iVA mpmhpr 2 




NM_001679 


-0.548768 


ATP1B3 


ATPase, Na+/K+ transporting, beta 
3 polypeptide 


35 


NM 001124 


-0.548601 


ADM 


adrenomedullin 




NM 001216 


-0.548375 


»CA9 


carbonic anhydrase IX 
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U58033 


-0.548354 


MTMR2 


myotubularin related protein 2 


NM_018389 


-0.547875 


FLJ11320 


hypothetical protein FLJ 11320 


AF1 76012 


0.547867 


JDP1 


J domain containing protein 1 


Contig66705_RC 


-0.546926 


ST5 


suppression of tumorigenicity 5 


MK>! r\AOAf\A 

NIvl__018194 


0.546878 


FLJ 10724 


hypothetical protein FLJ 10724 


NM_006851 


-0.546823 


RTVP1 


glioma pathogenesis-related protein 


Contig53870__ RC 


0.546756 




ESTs 


NM_002482 


-0.546012 


NASP 


nuclear autoantigenic sperm protein 
(histone-binding) 


NMJ302292 


0.545949 


LAMB2 


laminin, beta 2 (laminin S) 


NM_0 14696 


-0.545758 


l/i a A f\f A A 

KIAA0514 


KIAA0514 gene product 


Contig49855 


0.545517 




ESTs 


AL1 17666 


0.545203 


DKFZP586 
01624 


DKFZP58601624 protein 


NM_004701 


-0.545185 


CCNB2 


cyclin B2 


NM_007050 


0.544890 


PTPRT 


protein tyrosine phosphatase, 
receptor type, T 


NM_000414 


0.544778 


HSD17B4 


hydroxysteroid (17-beta) 
dehydrogenase 4 


Contig52398_RC 


-0.544775 




Homo sapiens cDNA: FLJ21950 fis, 
clone HEP04949 


AB007916 


f\ c A A A C\G 

0.544496 


1X1 A Kf\A A~1 

KIAA0447 


KIAA0447 gene product 


Contig66219_RC 


0.544467 


FLJ22402 


hypothetical protein FLJ22402 


D87453 


0.544145 


KIAA0264 


KIAA0264 protein 


NM_015515 


-0.543929 


DKFZP434 
G032 


DKFZP434G032 protein 


NM_001530 


-0.543898 


HIF1A 


hypoxia-inducible factor 1 , alpha 
subunit (basic helix-loop-helix 
transcription factor) 


NM_004109 


-0.543893 


FDX1 


ferredoxin 1 


NM_000381 


-0.543871 


MIDI 


midline 1 (Opitz/BBB syndrome) 


Contig43983_RC 


0.543523 


CS2 


calsyntenin-2 


Ml_ 1 0 1 (vl 






no mo sapiens rnrxiNM, cl/ina 
DKFZp586L2424 (from clone 
DKFZp586L2424) | 


NM_005764 


-0.543175 


DD96 


epithelial protein up-regulated in 
carcinoma, membrane associated 
protein 17 


Contig1838_RC 


0.542996 




Homo sapiens cDNA: FLJ22722 fis, 
clone HS1 14444 


NM_006670 


0.542932 


5T4 


5T4 oncofetal trophoblast 
glycoprotein 
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Contig28552_RC 


-0.542617 




Homo sapiens mRNA; cDNA 
DKFZp434C0931 (from clone 
DKFZp434C0931 ); partial cds | 


Contig14284 RC 


0.542224 




ESTs 


NM_006290 


-0.542115 


TNFAIP3 


tumor necrosis factor, alpha-induced 
protein 3 


AL050372 j 


0.541463 




Homo sapiens mRNA; cDNA 
DKFZp434A091 (from clone 
Ur\.r^p4o*»AUy i ), pamai cut> 


NM 014181 


-0.541095 


HSPC159 


HSPC159 protein 


Contig37141_RC 


0.540990 




Homo sapiens cDNA: FLJ23582 fis, 
clone LNG13759 


NM_000947 


-0.540621 


PRIM2A 


primase, polypeptide 2A (58kD) 


NM_002136 


0.540572 


HNRPA1 


heterogeneous nuclear 
riDonucieoproxein m i 


NM_004494 


-0.540543 


HDGF 


hepatoma-derived growth factor 
(n ign-iTiOD i lily group proiein i-ni\t?j 


Contig38983 RC 


0.540526 




roTp 
to I S 


Contig27882 RC 


-0.540506 




CCTc 

to I S 


Z11887 


-0.540020 


MMP7 


matrix metaiioproieinase t 
(matrilysin, uterine) 


NM 014575 


-0.539725 


SCHIP-1 


schwannomin interacting protein 1 


Contig38170 RC 


0.539708 




ESTs 


Contig44064_RC 


0.539403 




ESTs 


U68385 


0.539395 


MEIS3 


Meis (mouse) homolog 3 


Contig51967_RC 


0.538952 




ESTs 


Contig37562_RC 


0.538657 




ESTs, Weakly similar to 
transformation-related protein 
[H.sapiens] 


Contig40500_RC 


i 0.538582 




ESTs, Weakly similar to unnamed 
protein product [H.sapiens] 


Contig1129 RC 


0.538339 




ESTs 


NM_002184 


0.538185 


IL6ST 


interleukin 6 signal transducer 
(gp luU, oncosiaun ivi reutspiui^ 


AL049381 


0.538041 




Unmn coninnc r>HM A PI 11 9Q0O fl<? 

nomo sapiens clmnm rw no, 
clone NT2RP2004321 


NM 002189 


-0.537867 


II 4 CPA 

IL15KA 


mieneuKin 10 reuepiui, aipna 




"U.JO / JU£. 


CHIC2 


cystein-rich hydrophobic domain 2 


AB040881 


-0.537473 


KIAA1448 


KIAA1448 protein 


NM_016577 


-0.537430 


RAB6B 


RAB6B, member RAS oncogene 
family 


NM 001745 


0.536940 


CAMLG 


calcium modulating ligand 
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NM_005742 


-0.536738 


P5 


protein disulfide isomerase-related 
protein 




AB011132 


0.536345 


KIAA0560 


KIAA0560 gene product 


5 


Contig54898_RC 


0.536094 


PNN 


pinin, desmosome associated 
protein 




Contig45049_RC 


-0.536043 


FUT4 


fucosyltransferase 4 (alpha (1 ,3) 
fucosyltransferase, myeloid-specific) 




NM_006864 


-0.535924 


LILRB3 


leukocyte immunoglobulin-like 
receptor, subfamily B (with TM and 
ITIM domains), member 3 


10 


Contig53242_RC 


-0.535909 




Homo sapiens cDNA FLJ1 1436 fis, 
clone HEMBA1001213 




MM nCiRf^AA 
INiVI UUOOH-H 


a £?qc7<i o 
U.ooD/ 1z 


I D CM 


insulin receptor substrate 1 




Contig47456_RC 


0.535493 


CACNA1D 


calcium channel, voltage- 
dependent, L type, alpha 1 D subunit 










fcoTS 


15 


uonug^y i zd_ku 


A COCH Ofi 




ESTs 






0.O35067 


PDEF 


prostate epithelium-specific Ets 
transcription factor 




iNivi u iz^y 


A KOv1A"7y| 

U.Oo4y/4 


ocU14L2 


SEC14 (S. cerevisiae)-hke 2 




IMIVI U IO 1 M 




rLJIUooy 


hypothetical protein FLJ10659 




ooniigoou** / r\w 




TTVU'l 

I i Yru 


tweety (Drosophila) homolog 1 


20 


Contig54968_RC 


0.534754 




Homo sapiens cDNA FLJ 13558 fis, 
Clone PLACE 1007743 






-u.oo4oy4 


kjaa loyi 


KIAA1691 protein 




NM_005264 


0.534057 


GFRA1 


GDNF family receptor alpha 1 




NM_014036 


-0.533638 


SBBI42 


BCM-like membrane protein 
precursor 


25 


NM_018101 


-0.533473 


FLJ10468 


hypothetical protein FLJ 10468 




Contig56765_RC 


0.533442 




ESTs, Moderately similar to 
KU^fcio.z [Ceiegans] 




AB006746 


-0.533400 


PLSCR1 


phospholipid scramblase 1 




NM_001089 


0.533350 


ABCA3 


ATP-binding cassette, sub-family A 
(ABC1), member 3 


30 






Ci | A r\~7C\C\ 

rLJ 10709 


hypothetical protein FLJ 10709 




X94232 


-0.532925 


MAPRE2 


microtubule-associated protein, 
KP/EB family, member 2 






-0 *M9Qin 


MY01H 
vITU IU 


myosin a 




Contig292_RC 


0.532853 


FLJ22386 


hypothetical protein FLJ22386 


35 


NM_000101 


-0.532767 


CYBA 


cytochrome b-245, alpha 
polypeptide 




Contig47814_RC 


-0.532656 


HHGP 


HHGP protein 
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BNSDOCID: <WO 02103320A2_I_> 



WO 02/103320 



PCT/US02/18947 



5 



25 



Identifier 


Correlation 


Name 


Description 


NM 014320 


-0.532430 


SOUL 


putative neme-Dinaing proiein 


NMJ320347 


0.531976 


■MM A 

LZTFL1 


leucine zipper transcnpuon lacior- 
like 1 


NM 004^9^ ' 

I Ml VI VJVT^^O 


0 531936 


BAG1 


BCL2-associated athanogene 




-0 531914 




ESTs 


Pontia 11648 RC 


0 531704 




ESTs 


NM 018131 


-0.531559 


FLJ 10540 


hypothetical protein FLJ10540 


NM 004688 


-0.531329 


K II II 

NMI 


N-myc (ana oiAi ; inieracior 


NM 014870 


0.531101 


KIAA0478 


r\lAAU4/o gene proaucx 


Contig31424 RC 


0.530720 




bo 1 S 


NM_000874 


-0.530545 


FNAR2 


interferon (alpha, beta and omega) 
receptor 2 


Contig50588_RC 


0.530145 




ESTs ! 


NM 016463 


0.529998 


HSPC195 


hypothetical protein 


NM_013324 


0.529966 


CISH 


cytokine inducible SH2-containing 
protein 


NM_006705 j 


0.529840 


GADD45G 


growth arrest and UNA-damage- 
inducible, gamma 


Contig38901 RC 


-0.529747 




ESTs 


NM_004184 


-0.529635 


WARS 


tryptophanyl-tRNA synthetase 


NM 015955 


-0.529538 


LOC51072 


CGI-27 protein 


AF151810 


0.529416 


CGI-52 


similar to phosphatidylcholine 
transfer protein 2 


NM_002164 


-0.529117 


INDO 


indoleamine-pyrrole 2,3 
dioxygenase 


NM_004267 


-0.528679 


CHST2 


carbohydrate (chondroitin 6/keratan) 
sulfotransferase 2 


Contig32185_RC 


-0.528529 




Homo sapiens cDNA FLJ13997 fis, 
clone Y79AA1 002220 


NM_004154 


-0.528343 


P2RY6 


pyrimidinergic receptor P2Y, G- 
protein coupled, 6 


NM_005235 


0.528294 


ERBB4 


v-erb-a avian erythroblastic 
leukemia viral oncogene homolog- 
Iike4 


Contig40208 RC 


-0.528062 


LOC56938 


transcription factor BMAL2 


NM_013262 


0.527297 


MIR 


myosin regulatory light chain 
interacting protein 


NM_003034 


-0.527148 


SIAT8A 


sialyltransferase 8 (alpha-N- 
acetylneuraminate: alpha-2,8- 
sialytransferase, GD3 synthase) A 1 



35 



-67- 



BNSOCXaO. <WO 0210332QA2J_> 



WO 02/103320 



PCT/US02/18947 



15 



20 



30 



Identifier 


Correlation 


Name 


Description 


NM_004556 


-0.527146 


NFKBIE 


nuclear factor of kappa light 
polypeptide gene enhancer in B- 
cells inhibitor, epsilon 


NM_002046 


-0.527051 


GAPD 


glyceraldehyde-3-phosphate 
dehydrogenase 


NM_001905 


-0.526986 


CTPS 


CTP synthase 


Contig42402_RC 


0.526852 




ESTs 


NM_014272 


-0.526283 


ADAMTS7 


a disintegrin-like and 
metalloprotease (reprolysin type) 
with thrombospondin type 1 motif, 7 


AF076612 


0.526205 


CHRD 


chordin 


Contig57725_RC 


-0.526122 




Homo sapiens mRNA for HMG-box 
transcription factor TCF-3, complete 
cds 


Contig42041_RC 


-0.525877 




ESTs 


Contig44656_RC 


-0.525868 




ESTs, Highly similar to S02392 
alpha-2-macroglobulin receptor 
precursor [H.sapiens] 


NM_0 18004 


-0.525610 


FLJ10134 


hypothetical protein FLJ10134 


Contig56434_RC 


0.525510 




Homo sapiens cDNA FLJ 13603 fis, 
clone PLACE1010270 


D25328 


-0.525504 


PFKP 


phosphofructokinase, platelet 


Contig55950_RC 


-0.525358 


FLJ22329 


hypothetical protein FLJ22329 


NM_002648 


-0.525211 


PIM1 


pim-1 oncogene 


AL1 57505 


0.525186 




Homo sapiens mRNA; cDNA 
DKFZp586P1 124 (from clone 
DKFZp586P1124) 


AF061034 


-0.525185 


FIP2 


Homo sapiens FIP2 alternatively 
translated mRNA, complete cds. 


NM_014721 


-0.525102 


KIAA0680 


KIAA0680 gene product 


NM_001634 


-0.525030 


AMD1 


S-adenosylmethionine 
decarboxylase 1 


NM_UUOoU4 




Uool 


ueietea in spni-nana/spiii-Tooi i 
region 


Contig37778_RC 


0.524667 




ESTs, Highly similar to HLHUSB 
MHC class II histocompatibility 
antigen HLA-DP alpha- 1 chain 
precursor [H.sapiens] 


NM_003099 


0.524339 


SNX1 


sorting nexin 1 


AL079298 


0.523774 


MCCC2 


methylcrotonoyl-Coenzyme A 
carboxylase 2 (beta) 


NM 019013 


-0.523663 


FLJ10156 


hypothetical protein 
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BNSDOCID: <WO 0210332QA2_L> 



WO 02/103320 



PCT/US02/18947 



5 



15 



20 



30 



35 



Identifier 


Correlation 


Name 


Description 


NM_000397 


-0.523293 


CYBB 


cytochrome b-245, beta polypeptide 

i L/i ii ui iiw y 1 di iuiui i iciiuuo vj locaoc j 


NM_014811 


0.523132 


KIAA0649 


KIAA0649 gene product 


Contig20600_RC 


0.523072 




ESTs 


NM_005190 


-0.522710 


CCNC 


cyclin C 


AL161960 


-0.522574 


FLJ21324 


hypothetical protein FLJ21324 


AL1 17502 


0.522280 




Homo sapiens mRNA; cDNA 
DKFZp434D0935 (from clone 
DKFZp434D0935) 


AF1 31753 


-0.522245 




Homo sapiens clone 24859 mRNA 

com lanpp 

OCljUCl IOC 


IN I VI UUUoZU 




onPR 


nninniH Hih\/rirr>ntpriHcn^ rprfuptfl^P 
l^Uii lUiu u ii i y u I ujjici iu it ic? iGuuviaou 


NM_002115 


-0.521870 


HK3 


hexokinase 3 (white cell) 


NM 006460 


0.521696 


HIS1 


HMBA-inclucible 


NM_018683 


-0.521679 


ZNF313 


zinc finger protein 313 


NM_004305 


-0.521539 


BIN1 


bridging integrator 1 


NM_006770 


-0.521538 


MARCO 


macrophage receptor with 
collagenous structure 


NM_001166 


-0.521530 


BIRC2 


baculoviral IAP repeat-containing 2 


D42047 


0.521522 


KIAA0089 


KIAA0089 protein 


NM_016235 


-0.521298 


GPRC5B 


G protein-coupled receptor, family 

v, yiUUp v/ y MltJIilUt?! D 


IN IVl_UU40U*fr 


-u.o^ i icy 


UDD 

JirxD 


Ml\/.i Rf»\/ hinHinn nrntpin 


mk^ nno707 

INIvl UUZ/ 




i rvO I 


pujLCuy lyLrcti i if ocuitJiuiy yiaiiuic? 


AB029031 


-0.520761 


KIAA1108 


KIAA1108 protein 


NM_005556 


-0.520692 


KRT7 


keratin 7 


NM_018031 


0.520600 


WDR6 


WD repeat domain 6 


AL1 17523 


-0.520579 


KIAA1053 


KIAA1053 protein 


NM 004515 


-0 520363 


ILF2 


interleukin enhancer binding factor 
2, 45kD 


NM_004708 


-0.519935 


PDCD5 


programmed cell death 5 


kill /-\ /"\ r— /"\ r— 

NM_005935 


0.519765 


MLLT2 


myeioid/iympnoid or mixea-iineage 
leukemia (trithorax (Drosophila) 
homolog); translocated to, 2 


Contig49289_RC 


-0.519546 




Homo sapiens mRNA; cDNA 
uKrZpoooJiny (from clone 
DKFZp586J11l9); complete cds 


NM_000211 


-0.519342 


ITGB2 


integrin, beta 2 (antigen CD18 (p95), 
lymphocyte function-associated 
antigen 1 ; macrophage antigen 1 
(mac-1 ) beta subunit) 



-69- 



BNSOOCID: <WO__02103320A2LL> 
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15 



20 



25 



30 



Identifier 


Correlation 


Name 


Description 


AL079276 


0.519207 


LOC58495 


putative zinc finger protein from 
EUROIMAGE 566589 


Contig57825_RC 


0.519041 




ESTs 


NM_002466 


-0.518911 


MYBL2 


v-myb avian myeloblastosis viral 
oncogene homolog-like 2 


NM_0 16072 


-0.518802 


LOC51026 


CGI-141 protein 


AB007950 


-0.518699 


KIAA0481 


KIAA0481 gene product 


NM_001550 


-0.518549 


IFRD1 


interferon-related developmental 
regulator 1 


AF155120 


-0.518221 


UBE2V1 


ubiquitin-conjugating enzyme E2 
variant 1 


Contig49849_RC 


0.517983 




i— rvr* 1 A f _ ^l-K . n inn SI ri 4-f A ^ H 007AC 

ESTs, Weakly similar to AF1 88706 
1 g20 protein [H. sapiens] 


NM_016625 


-0.517936 


LOC51319 


hypothetical protein 


& 1 ■ a 4 a**k. At ***** 

NM_004049 


-0.517862 


BCL2A1 


BCL2-related protein A1 


Contig5071 9_RC 


0.517740 




ESTs 


D80010 


-0.517620 


LPIN1 


lipin 1 


NM_000299 


-0.517405 


PKP1 


plakophilm 1 (ectodermal 
dvsolasia/skin fraailitv svndrome) 


AL049365 


0.517080 


FTL 


ferritin, light polypeptide 


Contig65227 


0.517003 




ESTs 


NM_004865 


-0.516808 


TBPL1 


TBP-like 1 


Contig54813_RC 


0.516246 


FLJ 13962 


hypothetical protein FLJ13962 


NM_003494 


-0.516221 


DYSF 


dysferlin, limb girdle muscular 
dystrophy 2B (autosomal recessive) 


NM_004431 


-0.516212 


EPHA2 


EphA2 


AL1 17600 


-0.516067 


DKFZP564 
J0863 


DKFZP564J0863 protein 


AL080209 


-0.516037 


DKFZP586 
F2423 


hypothetical protein 
DKFZp586F2423 


NM_000135 


-0.515613 


FANCA 


Fanconi anemia, complementation 
group A 


NM_000050 


-0.515494 


ASS 


argininosuccinate synthetase 


NM_001830 


-0.515439 


CLCN4 


chloride channel 4 


NM_018234 


-0.515365 


FLJ10829 


hypothetical protein FLJ10829 


Contig53307_RC 


0.515328 




ESTs, Highly similar to KIAA1437 
protein [H.sapiens] 


AL1 17617 


-0.515141 




Homo sapiens mRNA; cDNA 
DKFZp564H0764 (from clone 
DKFZp564H0764) 


NM 002906 


-0.515098 


RDX 


radixin 
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dentifier 


Correlation 


Name 


Description 




NM_003360 


-0.514427 


UGT8 


UDP glycosyltransferase 8 (UDP- 
galactose ceramide 
galactosyltransferase) 


5 


NM_018478 


0.514332 


HSMNP1 


uncharacterized hypothalamus 
protein HSMNP1 




M90657 


-0.513908 


TM4SF1 


transmembrane 4 superfamily 
rnernuer i 




NM 014967 


0.513793 


KIAA1018 


KIAA1 01 8 protein 




Contig1462_RC 


0.513604 


C110RF1 

rr 
O 


chromosome 1 1 open reading frame 

ID 


10 


Contig37287_RC 


-0.513324 




to 1 S 




NM_000355 


-0.51 oZZo 




iranscouaiarnin n, rnauruoyuo 
anemia 




AB037756 


0.512914 


KIAA1335 


hypothetical protein KIAA1335 




Contig842 RC 


-0.512880 




ESTs | 




NM 018186 


n C A 0070 

-0.51 2878 


rl_J107UD 


nypoineiicaj proxein plj iu/ud 


15 


NM_0 14668 


0.512746 


KIAA0575 


KIAA0575 gene product 




NM 003226 


0.512611 


TFF3 


trefoil factor 3 (intestinal) 




Contig56457_RC 


-0.512548 


TMEFF1 


transmembrane protein with EGF- 
like and two follistatin-like domains 1 


20 


AL050367 


-0.511999 




Homo sapiens mRNA; cDNA 

HI/ C7n ^ A A A HO R /frnm rlnno 

Ur\rZ.poo*rMUZD ^Trom ciune 
DKFZp564A026) 


NM 014791 


-0.51 19oo 


K.IAA01 / 1> 


ixiAAUi #o gene proauci 




Cont»g363 1 2_RC 


f\ CA A 70 A 

0.51 1794 




to I S 




NM 004811 


-0.51 1447 


LPXN 


leupaxin 




Contig67182_RC 


-0.511416 




ESTs, Highly similar to epithelial V- 
liKe antigen precursor in.sapiensj 


25 


Contig52723 RC 


r\ C A A A O A 

-0.51 1 134 




to I S 




Contig 171 05_RC 






nOlTiO Sapioilo lilrxl>J/A IUI JJUldllVC 

cytoplasmatic protein (ORF1-FL21) 




NM_014449 


0.511023 


A 


protein "A" 




Contig52957 RC 


0.510815 




ESTs 




Contig49388 RC 


0.510582 


FLJ13322 


hypothetical protein FLJ 13322 


30 


NM 017786 


0.510557 


FLJ20366 


hypothetical protein FLJ20366 




AL1 57476 


0 510478 




Homo sapiens mRNA; cDNA 
DKFZp761C082 (from clone 
DKFZp761C082) 




NM_001919 


0.510242 


DCI 


dodecenoyl-Coenzyme A delta 
isomerase (3,2 trans-enoyl- 
Coenzyme A isomerase) 
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Identifier 


Correlation 


Name 


Description 


NM_000268 


-0.510165 


NF2 


neurofiDromin ^ (Diiaterai acoustic 
neuroma) 


NM_016210 


0.510018 


LOC51161 


g20 protein 


Contig45816_RC 


-0.509977 




ESTs 


NM_003953 


-o.ooyyba 


MPZL1 


myelin protein zero-iiKe i 


NM_000057 


-0.509669 


BLM 


Bloom syndrome 


NM_014452 


-0.509473 


DR6 


death receptor 6 


Contig45156_RC 


0.509284 




ESTs, Moderately similar to motor 
domain of KIF12 [M.musculus] 


NM_006943 


0.509149 


SOX22 


SRY (sex determining region Y)-box 
22 


NM_000594 


-0.509012 


TNF 


tumor necrosis factor (TNF 
superfamily, member 2) 


AL137316 


-0.508353 


K1AA1609 


KIAA1609 protein 


NM_000557 


-0.508325 


GDF5 


growth differentiation factor 5 
(cartilage-derived morphogenetic 
protein-1 ) 


NM_0 18685 


-0.508307 


ANLN 


anillin (Drosophila Scraps homolog), 
actin binding protein 


Contig53401_RC 


0.508189 




ESTs 


NIM HiA^RA 

INIVI U ItODH 






y lyod diLic?i lyvjt? o jji luopi ic* lc? 

dehydrogenase, testis-specific 


Contig50297_RC 


0.508137 




ESTs, Moderately similar to 
ALU8 HUMAN ALU SUBFAMILY 
SX SEQUENCE CONTAMINATION 
WARNING ENTRY [H.sapiens] 


Contig51800 


0.507891 




ESTs, Weakly similar to 
ALU6 HUMAN ALU SUBFAMILY 
SP SEQUENCE CONTAMINATION 
WARNING ENTRY [H.sapiens] 


Contig49098_RC 


-0.507716 


MGC4090 


hypothetical protein MGC4090 


NM_002985 


-0.507554 


SCYA5 


small inducible cytokine A5 
(RANTES) 


AB007899 


0.507439 


KIAA0439 


KIAA0439 protein; homolog of yeast 
ubiquitin-protein ligase Rsp5 


AL110139 


0.507145 




Homo sapiens mRNA; cDNA 
DKFZpo640i 7o3 (trom clone 
DKFZp56401 763) 


Contig51117_RC 


0.507001 




ESTs 


NM_017660 


-0.506768 


FLJ20085 


hypothetical protein FLJ20085 


NM_018000 


0.506686 


FLJ10116 


hypothetical protein FLJ10116 


NM 005555 


-0.506516 


KRT6B 


keratin 6B 
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Identifier 
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Name 
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NM_005582 


-0.506462 


LY64 


lympnocyie anxigen oh unouoe,/ 
homolog, radioprotective, 105kD 


Contia47405 RC 


0.506202 




ESTs J 


NM 014808 


0.506173 


KIAA0793 


KIAA0793 gene product 


NM 004938 


-0.506121 




aSain-aooOOIalcU piUlclli r\ll laoc I 


NM 020659 


-0.505793 


TTVU *1 

1 1 Yrn 


lW©©ly ^L/ruoijpi ilia/ injiiiuiuy i 


NM 006227 


-0.505604 


PLTP 


pnospnoiipia iransTer pruiciii 


NM_U14iiDO 






mirrotubule-associated orotein, 
RP/EB family, member 2 


NM 004711 


0.504849 


SYNGR1 


synaptogyrin 1 


NM 004418 


-0.504497 


DUSP2 


dual specificity phosphatase 2 


NM 003508 


-0.504475 


FZD9 


frizzled (Drosophila) homolog 9 
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GenBank 

Accession Number 


SEQ ID NO 


GenBank 

Accession Number 


SEQ ID NO 


AB002301 


SEQ ID NO 4 


NM_ 


.012391 


SEQ ID NO 1406 


AB004857 


SEQ ID NO 8 


NM. 


.012428 


SEQ ID NO 1412 


AB007458 


SEQ ID NO 12 


NM_ 


.013233 


SEQ ID NO 1418 


AB014534 


SEQ ID NO 29 


NM. 


.013253 


SEQ ID NO 1422 


AB018305 


SEQ ID NO 34 


NM_ 


013262 


SEQ ID NO 1425 


AB020677 


SEQ ID NO 36 


NM. 


013372 


SEQ ID NO 1434 


AB020689 


SEQ ID NO 37 


NM_ 


.013378 


SEQ ID NO 1435 


AB023151 


SEQ ID NO 41 


NM_ 


.014096 


SEQ ID NO 1450 


AB023163 


SEQ ID NO 43 


NM_ 


.014242 


SEQ ID NO 1464 


AB028986 


SEQ ID NO 48 


NM_ 


.014314 


SEQ ID NO 1472 


AB029025 


SEQ ID NO 50 


NM_ 


014398 


SEQ ID NO 1486 ! 


AB032966 


SEQ ID NO 53 


NM. 


.014402 


SEQ ID NO 1488 


AB032988 


SEQ ID NO 57 


NM. 


.014476 


SEQ ID NO 1496 


AB033049 


SEQ ID NO 63 


NM_ 


.014521 


SEQ ID NO 1499 


AB033055 


SEQ ID NO 66 


NM_ 


.014585 


SEQ ID NO 1504 


AB037742 


SEQ ID NO 73 


NM. 


.014597 


SEQ ID NO 1506 


AB041269 


SEQ ID NO 96 


NM. 


.014642 


SEQ ID NO 1510 


AF000974 


SEQ ID NO 97 


NM_ 


.014679 


SEQ ID NO 1517 


AF042838 


SEQ ID NO 111 


NM. 


.014680 


SEQ ID NO 1518 


AF052155 


SEQ ID NO 119 


NM. 


.014700 


SEQ ID NO 1520 


AF055084 


SEQ ID NO 125 


NM. 


.014723 


SEQ ID NO 1523 


AF063725 


SEQ ID NO 129 


NM. 


.014770 


SEQ ID NO 1530 


AF070536 


SEQ ID NO 133 


NM. 


.014785 


SEQ ID NO 1534 


AF070617 


SEQ ID NO 135 


NM. 


.014817 


SEQ ID NO 1539 


AF073299 


SEQ ID NO 136 


NM. 


.014840 


SEQ ID NO 1541 


AF079529 


SEQ ID NO 140 


NM. 


.014878 


SEQ ID NO 1546 


AF090353 


SEQ ID NO 141 


NM. 


.015493 


SEQ ID NO 1564 


AF1 16238 


SEQ ID NO 155 


NM. 


.015523 


SEQ ID NO 1568 


AF151810 


SEQ ID NO 171 


NM. 


.015544 


SEQ ID NO 1570 


AF220492 


SEQ ID NO 185 


NM. 


.015623 


SEQ ID NO 1572 


AJ224741 


SEQ ID NO 196 


NM. 


.015640 


SEQ ID NO 1573 


AJ250475 


SEQ ID NO 201 


NM. 


.015721 


SEQ ID NO 1576 


AJ270996 


SEQ ID NO 202 


NM. 


.015881 


SEQ ID NO 1577 


AJ272057 


SEQ ID NO 203 


NM. 


.015937 


SEQ ID NO 1582 


AK000174 


SEQ ID NO 21 1 


NM 


015964 


SEQ ID NO 1586 
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GenBank 

Accession Number 


SEQ ID NO 


GenBank 

Accession Number 


opn m no 

O LU IU 


AK000617 


SEQ ID NO 215 


NM_ 


.015984 


SEQ ID NO 1587 


AK000959 


SEQ ID NO 222 


NM_ 


016000 


SEQ ID NO 1591 


AK001438 


SEQ ID NO 229 


NM_ 


.016018 


SEQ ID NO 1593 


AK001838 


SEQ ID NO 233 


NM_ 


.016066 


SEQ ID NO 1601 


AK002107 


SEQ ID NO 238 


NM_ 


016073 


SEQ ID NO 1603 | 


AK002197 

i\i WVfc 1 


SEQ ID NO 239 ! 


NM_ 


016081 


SEQ ID NO 1604 


AL 035297 


SEQ ID NO 241 


NM_ 


016140 


SEQ ID NO 1611 


AL 049346 


SEQ ID NO 243 


NM_ 


.016223 


SEQ ID NO 1622 


AL 049370 

/^L.w" w w 1 w 


SEQ ID NO 245 ' 


NM_ 


.016267 


SEQ ID NO 1629 


Al 04Q667 


SEQ ID NO 249 


NM_ 


.016307 


SEQ ID NO 1633 


A! 080222 


SEQ ID NO 276 


NM_ 


016364 


SEQ ID NO 1639 


Al 0Q67T7 


SEQ ID NO 279 


NM_ 


016373 


SEQ ID NO 1640 


Al 1101fi3 


SEQ ID NO 282 


NM_ 


.016459 


SEQ ID NO 1646 


Al 133057 

1 VVVVI 


SEQ ID NO 300 


NM_ 


016471 


SEQ ID NO 1648 


AL 133096 

/\l — 1 WWWWW 


SEQ ID NO 302 


NM_ 


016548 


SEQ ID NO 1654 


AL1 33572 

AL I wwv/ 1 ^ 


SEQ ID NO 305 


NM_ 


_0 16620 


SEQ ID NO 1662 


AL133619 

r\l_ 1 www 1 w 


SEQ ID NO 307 


NM_ 


016820 


SEQ ID NO 1674 


AL1 33623 


SEQ ID NO 309 


NM. 


.017423 


SEQ ID NO 1678 


AL 137347 

HL 1 w # w~ 1 


SEQ ID NO 320 


NM. 


.017709 


SEQ ID NO 1698 


AL 137381 


SEQ ID NO 322 


NM_ 


_0 17732 


SEQ ID NO 1700 


AL 137461 


SEQ ID NO 325 


NM_ 


.017734 


SEQ ID NO 1702 


AL 137540 


SEQ ID NO 328 


NM. 


_017750 


SEQ ID NO 1704 


AL 137555 

/ \ 1— 1 1 WWW 


SEQ ID NO 329 


NM. 


_017763 


SEQ ID NO 1706 


AL 137638 

rvL ■ v/ • www 


SEQ ID NO 332 


NM. 


_017782 


SEQ ID NO 1710 


AL1 37639 

1 W f WWW 


SEQ ID NO 333 


NM. 


_017816 


SEQ ID NO 1714 


AL 137663 

/\L- 1 w f www 


SEQ ID NO 334 


NM. 


_0 18043 


SEQ ID NO 1730 


AL1 37761 

/\L- 1 w # # w 1 


SEQ ID NO 339 


NM. 


_018072 


SEQ ID NO 1734 


AL1 57431 


SEQ ID NO 340 


NM. 


018093 


SEQ ID NO 1738 


AL161960 

/\L— 1 W 1 WWW 


SEQ ID NO 351 


NM. 


018103 


SEQ ID NO 1742 


Al 355708 

nLvww I ww 


SEQ ID NO 353 


NM. 


_018171 


SEQ ID NO 1751 


AL359053 

/\l—w ww w ww 


SEQ ID NO 354 


NM. 


_018187 


SEQ ID NO 1755 


D26488 


SEQ ID NO 359 


NM. 


_018188 


SEQ ID NO 1756 


D38521 


SEQ ID NO 361 


NM. 


_018222 


SEQ ID NO 1761 


D50914 


SEQ ID NO 367 


NM. 


_018228 


SEQ ID NO 1762 


D80001 


SEQ ID NO 369 


NM. 


_018373 


SEQ ID NO 1777 


G26403 


SEQ ID NO 380 


NM. 


_018390 


SEQ ID NO 1781 


K02276 


SEQ ID NO 383 


NM 


018422 


SEQ ID NO 1784 
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GenBank 

Accession Number 


bh(J ID NO 


oeriDanK 

Accession Number 




M21551 


SEQ ID NO 394 


NM_018509 


SEQ ID NO 1792 


M27749 


SEQ ID NO 397 


NM_018584 


SEQ ID NO 1796 


M28170 


SEQ ID NO 398 


NM_018653 


SEQ ID NO 1797 


M73547 


SEQ ID NO 409 


NM_018660 


SEQ ID NO 1798 


M80899 


SEQ ID NO 411 


NM_018683 


SEQ ID NO 1799 


NM_000067 


SEQ ID NO 423 


NM_019049 


SEQ ID NO 1814 


NM_000087 


SEQ ID NO 427 


NM_0 19063 


SEQ ID NO 1815 


NM_000090 


SEQ ID NO 428 


NM_020150 


SEQ ID NO 1823 


NM_000165 


SEQ ID NO 444 


NM_020987 


SEQ ID NO 1848 


NM_000168 


SEQ ID NO 445 


NM_021095 


SEQ ID NO 1855 


NM_000196 


SEQ ID NO 449 


NM_021242 


SEQ ID NO 1867 


NM_000269 


SEQ ID NO 457 


U41387 


SEQ ID NO 1877 


NM_00031O 


SEQ ID NO 466 


U45975 


SEQ ID NO 1878 i 


NM_000396 


SEQ ID NO 479 


U58033 


SEQ ID NO 1881 


NM_000397 


SEQ ID NO 480 


U67784 


SEQ ID NO 1884 


NM_000597 


SEQ ID NO 502 


U68385 


SEQ ID NO 1885 


NM_000636 


SEQ ID NO 509 


U80736 


SEQ ID NO 1890 


NM_000888 


SEQ ID NO 535 


X00437 


SEQ ID NO 1899 


NM_000903 


SEQ ID NO 536 


X07203 


SEQ ID NO 1904 


NM_000930 


SEQ ID NO 540 


X16302 


SEQ ID NO 1907 


NM_000931 


SEQ ID NO 541 


X51630 


SEQ ID NO 1908 


NM_000969 


SEQ ID NO 547 


X57809 


SEQ ID NO 1912 


NM_000984 


SEQ ID NO 548 


X57819 


SEQ ID NO 1913 


NM_001026 


SEQ ID NO 552 


X58529 


SEQ ID NO 1914 J 


NM_001054 


SEQ ID NO 554 


X66087 


SEQ ID NO 1916 


NM_001179 


SEQ ID NO 567 


X69150 


SEQ ID NO 1917 


NM_001184 


SEQ ID NO 568 


X72475 


SEQ ID NO 1918 


NM_001204 


SEQ ID NO 571 


X74794 


SEQ ID NO 1920 


NM_001206 


SEQ ID NO 572 


X75315 


SEQ ID NO 1921 


NM_001218 


SEQ ID NO 575 


X84340 


SEQ ID NO 1925 


NM_001275 


SEQ ID NO 586 


X98260 


SEQ ID NO 1928 


NM_001394 


SEQ ID NO 602 


Y07512 


SEQ ID NO 1931 


NM_001424 


SEQ ID NO 605 


Y14737 


SEQ ID NO 1932 


NM_001448 


SEQ ID NO 610 


Z34893 


SEQ ID NO 1934 


NM_001504 


SEQ ID NO 620 


Contig237_RC 


SEQ ID NO 1940 


NM_001553 


SEQ ID NO 630 


Contig292_RC 


SEQ ID NO 1942 


NM 001674 


SEQ ID NO 646 


Contig372_RC 


SEQ ID NO 1943 
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GenBank SEQ ID NO 
Ar*r*^QQinn Numbsr 

AwvdOIUI 1 1 'Ul I 1 w wl 


^ __d „_r, ocn in mo 
oenuanK otvj \u inw 

Accession Number I 


NM 001675 SEQ ID NO 647 


Contig756_RC SEQ ID NO 1955 


NM 001725 SEQ ID NO 652 

NlVI \J\J 1 f « \«/l»N*t IL/ 1 ^ V^ w%^^— 


Contig842_RC SEQ ID NO 1958 


NM 001740 SEQ ID NO 656 

>||VI \J\J I I i w Vi/^— V*t IL^ 1 Ti w%^>^ 


Contigl 632_RC SEQ ID NO 1 977 


MM 0017^6 SEQ ID NO 659 

N IVI \J\J 1 I L« VaC IL/ 1 ^1 V^ ww 


Contig1826_RC SEQ ID NO 1980 j 


MM 001770 SEQ ID NO 664 


Contig2237_RC SEQ ID NO 1988 


NM 001797 SEQ ID NO 670 

XlVI \J\J 1 f ^/ f ^^fcw«»VaC 1 I »>/ wa w 


Contig291 5_RC SEQ I D NO 2003 


MM 00184*S SEQ ID NO 680 


Contig3164_RC SEQ ID NO 2007 


MM O01873 SEQ ID NO 684 

\ l VI ww Iwi W V*C 11-^ iiv WW^ 


Contlg3252_RC SEQ ID NO 2008 


MM OOlftftft SEQ ID NO 687 

\|VI WW I UUU UL-\x II—/ 1 ^1 WW 1 


Contig3940_RC SEQ ID NO 2018 j 


MM 001 RQ2 SEQ ID NO 688 

\IVI ww 1 W wfc ^-/l—Va< 1 L/ 1 <lw www 


Contig9259_RC SEQ ID NO 2039 


NM 001919 SEQ ID NO 694 

N I VI ww 1 w 1 w v«/ *— vac i \—f ■ ^ ww ■ 


Contigl 0268_RC SEQ ID NO 2041 


NM 001946 SEQ ID NO 698 

M 1 V 1 ^/ ^/ 1 ^/ l ^™ vac » » » v^ 


Contigl 0437_RC SEQ ID NO 2043 


NM 001953 SEQ ID NO 699 


Contigl 0973_RC SEQ ID NO 2044 


NM 001960 SEQ ID NO 704 


Contigl 4390_RC SEQ ID NO 2054 j 


NM 001985 SEQ ID NO 709 

\j 1 v 1 \J \s 1 v^ vat ■ ■ ^ v^ • *^ 


Contigl 6453_RC SEQ ID NO 2060 


NM 002023 SEQ ID NO 712 


Contigl 6759_RC SEQ ID NO 2061 


MM 0020*51 SEQ ID NO 716 

NIVI wwfcww 1 v^L— VaC IL/ i^iv^ f iw 


Contigl 9551 SEQ ID NO 2070 


MM 0020^3 SEQ ID NO 717 

N 1 VI \J\J \J V» V/ V^t»»VaC Ik/ ■ w/ f ■ • 


Contig24541_RC SEQ ID NO 2088 


MM 002164 SEQ ID NO 734 


Contig25362_RC SEQ ID NO 2093 


NM 002200 SEQ ID NO 739 


Contig25617_RC SEQ ID NO 2094 


NM 002201 SEQ ID NO 740 


Contig25722_RC SEQ ID NO 2096 | 


NM 002213 SEQ ID NO 741 


Contig26022_RC SEQ ID NO 2099 


NM 002250 SEQ ID NO 747 

mIVI V/ Ubbvw ^^^»va< i l/ ■ ii v/ ■ ■ ■ 


Contig2791 5_RC SEQ ID NO 21 14 


NM 002512 SEQ ID NO 780 

\IVI V/www 1^ L» >o( 11/ 1 lV/ f WW 


Contig28081_RC SEQ ID NO 21 1 6 


MM 002^42 SEQ ID NO 784 

^ I VI VUtOT^t. VJ ^_ VaC II-/ Mw 1 W^ 


Contig281 79_RC SEQ ID NO 21 1 8 


MM 002^fi1 SEQ ID NO 786 


Contig28550_RC SEQ ID NO 21 19 


MM 002615 SEQ \D NO 793 

I^IVI * i*» J f \J | v/^»Va< • t/ i »\/ • w%^ 


Contig29639_RC SEQ ID NO 2127 


MM 002686 SEQ ID NO 803 

|>j|VI V/w£«UUv V^ ^» VaC IL/ i ii v/ v^ww 


Contig29647_RC SEQ ID NO 2128 


MM 002709 SEQ ID NO 806 

| ^ | VI yUbl V/<w V/ 1^— VaC ■ ^^ww 


Contig30092_RC SEQ ID NO 21 30 


NM 002742 SEQ ID NO 812 

I VI VI w wfc # i fc- v^ « vac 1 1 ' v/ w i *— 


Contig30209_RC SEQ ID NO 2 1 32 


MM 002775 SEQ ID NO 815 

I >J I V 1 \J\J£- I I w V/ VaC 11-/ 1 ^v/ w ■ w 


Contig321 85_RC SEQ ID NO 21 56 


MM 002Q7 1 ! SEQ ID NO 848 

\\ |VI w wZLw ff w v^i— vac ■ •-/ I nw w ■ w 


Contig32798_RC SEQ ID NO 21 61 


NM 002982 SEQ ID NO 849 


Contig33230_RC SEQ ID NO 2163 


NM 003104 SEQ ID NO 870 


Contig33394_RC SEQ ID NO 2165 


NM 003118 SEQ ID NO 872 


Contig36323_RC SEQ ID NO 21 97 


NM 003144 SEQ ID NO 876 


Contig36761_RC SEQ ID NO 2201 


NM 003165 SEQ ID NO 882 


Contig37141_RC SEQ ID NO 2209 
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Accession Number 


GenBank SEQ ID NO 
Accession Number 


NM 003197 SEQIDN0 885 


Contig37778_RC SEQ ID NO 221 8 


NM 003202 SEQIDN0 886 


Contig38285_RC SEQ ID NO 2222 


NM 003217 SEQIDN0 888 


Contig38520_RC SEQ ID NO 2225 


NM 003283 SEQ ID NO 898 


Contig38901_RC SEQ ID NO 2232 


NM 003462 SEQ ID NO 911 


Contig39826_RC SEQ ID NO 2241 


NM_003500 SEQ ID NO 918 


Contig40212_RC SEQ ID NO 2251 


NM 003561 SEQ ID NO 925 


Contig40712_RC SEQ ID NO 2257 


NM 003607 SEQ ID NO 930 j 


Contig41402_RC SEQ ID NO 2265 | 


NM 003633 SEQ ID NO 933 


Contig41 635_RC SEQ ID NO 2272 


NM 003641 SEQ ID NO 934 


Contig42006_RC SEQ ID NO 2280 


NM 003683 SEQ ID NO 943 


Contig42220_RC SEQ ID NO 2286 


NM 003729 SEQ ID NO 949 


Contig42306_RC SEQ ID NO 2287 


NM 003793 SEQ ID NO 954 


Contig439 1 8_RC SEQ ID NO 231 2 


NM 003829 SEQ ID NO 958 


Contig44195_RC SEQ ID NO 2316 | 


NM 003866 SEQ ID NO 961 


Contig44265_RC SEQ ID NO 2318 


NM 003904 SEQ ID NO 967 


Contig44278_RC SEQ ID NO 2319 


NM 003953 SEQ ID NO 974 


Contig44757_RC SEQ ID NO 2329 


NM 004024 SEQ ID NO 982 


Contig45588_RC SEQ ID NO 2349 


NM 004053 SEQ ID NO 986 


Contig46262_RC SEQ ID NO 2361 


NM 004295 SEQ ID NO 1014 


Contig46288_RC SEQ ID NO 2362 j 


NM 004438 SEQ ID NO 1038 


Contig46343_RC SEQ ID NO 2363 


NM 004559 SEQ ID NO 1057 


Contlg46452_RC SEQ ID NO 2366 


NM 004616 SEQ ID NO 1065 


Contig46868_RC SEQ ID NO 2373 


NM 004741 SEQ ID NO 1080 


Contig46937_RC SEQ ID NO 2377 


NM 004772 SEQ ID NO 1084 


Contig48004_RC SEQ ID NO 2393 


NM 004791 SEQ ID NO 1086 


Contig48249_RC SEQ ID NO 2397 


NM 004848 SEQ ID NO 1094 


Contig48774_RC SEQ ID NO 2405 


NM 004866 SEQ ID NO 1097 


Contig4891 3_RC SEQ ID NO 241 1 


NM 005128 SEQ ID NO 1121 


Contig48945_RC SEQ ID NO 241 2 


NM 005148 SEQ ID NO 1124 


Contig48970_RC SEQ ID NO 2413 


NM 005196 SEQ ID NO 1127 


Contig49233_RC SEQ ID NO 2419 


NM 005326 SEQ ID NO 1140 


Contig49289_RC SEQ ID NO 2422 


NM 005518 SEQ ID NO 1161 


Contig49342_RC SEQ ID NO 2423 


NM 005538 SEQ ID NO 1163 


Contig4951 0_RC SEQ ID NO 2430 


NM 005557 SEQ ID NO 1170 


Contig49855 SEQ ID NO 2440 


NM 005718 SEQ ID NO 1189 


Contig49948_RC SEQ ID NO 2442 


NM 005804 SEQ ID NO 1201 


Contig50297_RC SEQ ID NO 2451 
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Accession Number 


otU IU NU 


bentjanK 

Accession Number 


cpn in mo 


NM 


005824 


SEQ ID NO 1203 


Contig50669_RC 


SEQ ID NO 2458 


NM_ 


005935 


SEQ ID NO 1220 


Contig50673_RC 


SEQ ID NO 2459 j 


NM 


006002 


SEQ ID NO 1225 


Contig50838_RC 


SEQ ID NO 2465 


NM_ 


006148 


SEQ ID NO 1249 


Contig51068 RC 


SEQ ID NO 2471 


NM 


006235 


SEQ ID NO 1257 


Contig51929 


SEQ ID NO 2492 


NM 


006271 


SEQ ID NO 1261 


Contig51953_RC 


SEQ ID NO 2493 


NM 


006287 


SEQ ID NO 1264 


Contig52405_RC 


SEQ ID NO 2502 | 


NM_ 


006296 


SEQ ID NO 1267 


Contig52543_RC 


SEQ ID NO 2505 j 


NM 


006378 


SEQ ID NO 1275 


Contig52720 RC 

w — 


SEQ ID NO 2513 


NM 


006461 


SEQ ID NO 1287 


Contig53281_RC 


SEQ ID NO 2530 


NM 


006573 


SEQ ID NO 1300 


Contig53598_RC 


SEQ ID NO 2537 


NM_ 


006622 


SEQ ID NO 1302 


Contig53757_RC 


SEQ ID NO 2543 


NM 


006696 


SEQ ID NO 1308 


Contig53944_RC 


SEQ ID NO 2545 


NM 


006769 


SEQ ID NO 1316 


Contig54425 


SEQ ID NO 2561 


NM 


006787 


SEQ ID NO 1319 


Contig54547_RC 


SEQ ID NO 2565 


NM 


006875 


SEQ ID NO 1334 


Contig54757_RC 


SEQ ID NO 2574 


NM. 


_006885 


SEQ ID NO 1335 


Contig54916_RC 


SEQ ID NO 2581 


NM 


006918 


SEQ ID NO 1339 


Contig55770_RC 


SEQ ID NO 2604 


NM 


006923 


SEQ ID NO 1340 


Contig55801_RC 


SEQ ID NO 2606 


NM 


006941 


SEQ ID NO 1342 


Contig56143_RC 


SEQ ID NO 2619 


NM 


007070 


SEQ ID NO 1354 


Contig56160_RC 


SEQ ID NO 2620 


NM 007088 


SEQ ID NO 1356 


Contig56303 RC 


SEQ ID NO 2626 


NM 


007146 


SEQ ID NO 1358 


Contig57023_RC 


SEQ ID NO 2639 J 


NM 


007173 


SEQ ID NO 1359 


Contig57138_RC 


SEQ ID NO 2644 


NM 


007246 


SEQ ID NO 1366 


Contig57609_RC 


SEQ ID NO 2657 


NM 


007358 


SEQ ID NO 1374 


Contig58301_RC 


SEQ ID NO 2667 


NM 


012135 


SEQ ID NO 1385 


Contig58512_RC 


SEQ ID NO 2670 


NM 


012151 


SEQ ID NO 1387 


Contig60393 


SEQ ID NO 2674 


NM 


012258 


SEQ ID NO 1396 


Contig60509_RC 


SEQ ID NO 2675 


NM. 


_012317 


SEQ ID NO 1399 


Contig61254_RC 


SEQ ID NO 2677 


NM 


012337 


SEQ ID NO 1403 


Contig62306 


SEQ ID NO 2680 


NM 012339 


SEQ ID NO 1404 


Contig64502 


SEQ ID NO 2689 
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sporadic tumors. 



5 



10 



20 



25 



Identifier 


Correlation 


Sequence 
Name 


Description 


NM_001892 


-0.651689 


CSNK1A1 


casein kinase 1 , alpha 1 j 


NM_018171 


-0.637696 


FLJ 10659 


hypothetical protein FLJ10659 


Contig40712_RC 


-0.612509 




ESTs 


NM_001204 


-0.608470 


BMPR2 


bone morphogenetic protein 
receptor, type II (serine/threonine 
kinase) 


NM 005148 

1 1 1 VI \J\J\s 1 i w 


-0.598612 


UNC1 19 j 


unci 19 (C.elegans) homolog 


G26403 


0.585054 


YWHAH j 


tyrosine 3- 

monooxygenase/tryptophan 5- 
monooxygenase activation protein, 
eta polypeptide 


NM_015640 


0.583397 


PAI-RBP1 


PAM mRNA-binding protein 


Contig9259_RC 


0.581362 




ESTs 


AB033049 


-0.578750 


KIAA1223 


KIAA1223 protein 


NM_015523 


0.576029 


DKFZP566E 
144 


small fragment nuclease 


Contig41402_RC 


-0.571650 




Human DNA sequence from clone 
RP11-16L21 on chromosome 9. 
Contains the gene for NADP- 
dependent leukotriene B4 12- 
hydroxydehydrogenase, the gene 

IUI ct IlUVcl UWCkxJ (JvJIIIClill piuitsiii 

similar to Drosophila, C. elegans 
and Arabidopsis predicted proteins, 
the GNG10 gene for guanine 
nucleotide binding protein 10, a 
novel gene, ESTs, STSs, GSSs 
and six CpG islands 


NM_004791 


-0.564819 


ITGBL1 


integrin, beta-like 1 (with EGF-like 
repeat domains) 


NM_007070 


0.561173 


FAP48 


FKBP-associated protein 


NM_014597 


0.555907 


HSU 15552 


acidic 82 kua protein mRNA 


AF000974 


0.547194 


TRIP6 


thyroid hormone receptor interactor 

0 


NM_016073 


-0.547072 


CGI-142 


CGI-142 | 


Contig3940_RC 


0.544073 


YWHAH 


tyrosine 3- 

monooxygenase/tryptophan 5- 
monooxygenase activation protein, 
eta polypeptide 


NM_003683 


0.542219 


D21S2056E 


DNA segment on chromosome 21 
(unique) 2056 expressed sequence 
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Correlation 


Sequence 
Name 


Description 




Contig58512_RC 


-0.528458 




Homo sapiens pancreas tumor- 
related protein (FKSG12) mRNA, 
complete cds 


5 


NM 003904 


0.521223 


ZNF259 


zinc finger protein 259 




Contig26022_RC 


0.517351 




ESTs 




Contig48970_RC 


-0.516953 


KIAA0892 


KIAA0892 protein 




NM 016307 


-0.515398 


PRX2 


paired related homeobox protein 


10 


A 1 A 077C -4 

AL1 37761 


A A QA4 




nOmO ScipiCllo nir\lM/-\, OL-MM/A 

DKFZp586L2424 (from clone 
DKFZp586L2424) 




NM_001919 


-0.514799 


DCI 


dodecenoyl-Coenzyme A delta 
isomerase (3,2 trans-enoyl- 
Coenzyme A isomerase) 




NM_000196 


-0.514004 


HSD11B2 


hydroxysteroid (11 -beta) 
dehydrogenase 2 


15 


NM 002200 


0.513149 


IRF5 


interferon regulatory factor 5 


AL1 33572 


0.511340 




Homo sapiens mRNA; cDNA 
DKFZp434l0535 (from clone 
DKFZp434l0535); partial cds 




NM_019063 


0.511127 


C20RF2 


chromosome 2 open reading frame 
2 




Contig25617_RC 


0.509506 




ESTs 


OA 


NM_007358 


0.508145 


M96 


putative DNA binding protein 




NM_014785 


-0.507114 


KIAA0258 


KIAA0258 gene product 




NM_006235 


0.506585 


POU2AF1 


POU domain, class 2, associating 
factor 1 




NM_014680 


-0.505779 


KIAA0100 


KIAA0100 gene product 


25 


X66087 


0.500842 


MYBL1 


v-myb avian myeloblastosis viral 
oncogene homolog-like 1 




Y07512 


-0.500686 


PRKG1 


protein kinase, cGMP-dependent, 
type I 




NM_006296 


0.500344 


VRK2 


vaccinia related kinase 2 




Contig44278_RC 


0.498260 


DKFZP434K 
114 


DKFZP434K114 protein 


30 


Contig56160_RC 


-0.497695 




ESTs 




NM 002023 


-0.497570 


FMOD 


fibromodulin 




M28170 


0.497095 


CD19 


CD19 antigen 




D26488 


0.49651 1 


KIAA0007 


KIAA0007 protein 


35 


X72475 


0.496125 




H.sapiens mRNA for reamanged Ig 
kappa light chain variable region 
(1.114) 
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Identifier 


Correlation 


Sequence 
Name 


Description 




K02276 


0.496068 


MYC 


v-myc avian myelocytomatosis viral 
oncogene homolog 




NM_013378 


0.495648 


VPREB3 


pre-B lymphocyte gene 3 


5 


X58529 


0.495608 


IGHM 


immunoglobulin heavy constant mu 




NM_000168 


-0.494260 


GLI3 


GLI-Kruppel family member GLI3 
(Greig cephalopolysyndactyly 
syndrome) 




NM_004866 


-0.492967 


SCAMP1 

* 


secretory carrier membrane protein 
1 


10 


NM_0 13253 


-0.491159 


DKK3 


dickkopf (Xenopus laevis) homolog 
3 




MM Oft*V79Q 


0 48RQ71 


RPC 


RNA S'-terminal ohosohate cvclase 




mm oorrt^ 


0 487407 


PIM2 


nim-2 onfionene 




MM 01fl1flf? 


0 487126 


FLJ 10709 


hvDothetical Drotein FLJ10709 




NM_004848 


0.485408 


ICB-1 


basement membrane-induced gene 


15 


NM_001179 


0.483253 


ART3 


ADP-ribosyltransferase 3 




NM_016548 


-0.482329 


LOC51280 


golgi membrane protein GP73 




NM_007146 


-0.481994 


ZNF161 


zinc finger protein 161 




NM 021242 


-0.481754 


STRAIT1149 
9 


hypothetical protein STRAIT1 1499 


20 


NM_016223 


0.481710 


PACSIN3 


protein kinase C and casein kinase 
substrate in neurons 3 




NM_003197 


-0.481526 


TCEB1L 


transcnption elongation factor B 
(SHI), polypeptide 1-like 




NM_000067 


-0.481003 


CA2 


carbonic anhydrase II 




NM_006885 


-0.479705 


ATBF1 


AT-binding transcription factor 1 




NM_002542 


0.478282 


OGG1 


8-oxoguanine DNA glycosylase 


25 


AL133619 


-0.476596 




Homo sapiens mRNA; cDNA 
DKFZp434E2321 (from clone 
DKFZp434E2321); partial cds 




D80001 


0.476130 


KIAA0179 


KIAA0179 protein 




NM_018660 


-0.475548 


LOC55893 


papillomavirus regulatory factor 
PRF-1 


30 


AB004857 


0.473440 


SLC11A2 


solute carrier family 1 1 (proton- 
coupled divalent metal ion 
transporters), member 2 




NM_002^t)U 


\jAi zyuu 


f\LrNN4 


potassium inierrneuiaie/ smau 
conductance calcium-activated ! 
channel, subfamily N, member 4 


35 


Contig56143_RC 


-0.472611 




ESTs, Weakly similar to A54849 
collagen alpha 1 (VII) chain 
precursor [H.sapiens] 



-82- 
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10 



15 



20 



30 



Identifier 


Correlation 


Sequence 
Name 


Description 


NM_001960 


0.471502 


EEF1D 


eukaryotic translation elongation 
tactor i oeita (guanine nucieouue 
exchange protein) 


Contig52405_RC 


-0.470705 




ESTs, Weakly similar to 

ALU8 HUMAN ALU SUBFAMILY 

OA ohUUbNUb UUN 1 AMIINA 1 lUrM 

WARNING ENTRY [H.sapiens] 


Contig30092_RC 


-0.469977 




Homo sapiens KK-oomam zinc 
finger protein 6 isoform B (PRDM6) 
mRNA, partial cds; alternatively 
spliced 


NM_003462 


-0.468753 


P28 


dynein, axonemal, light 
intermediate polypeptide 


Contig60393 


0.468475 




ESTs 


Contig842_RC 


0.468158 




ESTs 


NM_002982 


0.466362 


SCYA2 


small inducible cytokine A2 
(monocyte chemotactic protein 1 , 
homologous to mouse Sig-je) 


Contig14390_RC 


0.464150 




ESTs 


NM_001770 


0.463847 


CD19 


CD1 9 antigen 


AK000617 


-0.463158 




Homo sapiens uikna, cuima 
DKFZp434L235 (from clone 
DKFZp434L235) 


AF073299 


-0.463007 


SLC9A2 


solute carrier family 9 
(sodium/hydrogen exchanger), 
isoform 2 


NM_019049 


0.461990 


FLJ20054 


hypothetical protein 


AL1 37347 


-0.460778 


DKFZP761M 
1511 


hypothetical protein 


NM_000396 


-0.460263 


CTSK 


cathepsin K (pycnodysostosis) 


NM_018373 


-0.459268 


FLJ 11271 


hypothetical protein FLJ11271 


NM_002709 


0.458500 


PPP1CB 


protein phosphatase 1 , catalytic 
subunit, beta isoform 


NM_0 16820 


0.457516 


OGG1 


8-oxoguanine DNA glycosylase 


Contig10268_RC 


0.456933 




Human DNA sequence from clone 
RP1 1-196N14 on chromosome 20 
Contains ESTs, STSs, GSSs and 
CpG islands. Contains three novel 
genes, part of a gene for a novel 
protein similar to protein 
serine/threonine phosphatase 4 
regulatory subunit 1 (PP4R1) and a 
gene for a novel protein with an 
ankyrin domain 



-83- 



BNSOOCIO <WO O21O3320A2_l_> 



WO 02/103320 



PCT/US02/18947 



> 



5 



10 



15 



Identifier 


Correlation 


Sequence 
Name 


Description 


NM_014521 


-0.456733 


SH3BP4 


SH3-domain binding protein 4 


AJ272057 


-0.456548 


STRAIT1 149 

Q 


hypothetical protein STRAIT1 1499 


NM_015964 


-0.456187 


LOC51673 


brain specific protein 


Contig16759_RC 


-0.456169 




ESTs 


NM_015937 


-0.455954 


LOC51604 


CGI-06 protein 


NM_007246 


-0.455500 


KLHL2 


kelch (Drosophila)-Hke 2 (Mayven) 


NM_001985 


-0.453024 


ETFB 


electron-transfer-flavoprotein, beta 
polypeptide 


NM_000984 


-0.452935 


RPL23A 


ribosomal protein L23a 


Contig51953_RC 


-0.451695 




ESTs 


NM_015984 


0.450491 


UCH37 


ubiauitin C-terminal hvdrnla^f* 
UCH37 


NM_000903 


-0.450371 


DIA4 


diaphorase (NADH/NADPH) 
(cytochrome b-5 reductase) 


NM_001797 


-0.449862 


CDH11 


cadherin 1 1 , type 2, OB-cadherin 
(osteoblast) 


NM_014878 


0.449818 


KIAA0020 


KIAA0020 gene product 


NM_002742 


-0.449590 


PRKCM 


protein kinase C, mu 



20 



25 



30 



35 

-84- 

BNSDOCID: <WO 0210332QA2_I_> 



WO 02/103320 PCT/US02/18947 

9 

I » 



Table 5. 231 gene markers that distinguish patients with good prognosis from patients 
with poor prognosis. 



5 



15 



25 



35 



GenBank 

Accession NumDer 


SEQ ID NO 


GenBank 


SEQ ID NO 


AACCCnOQ DP 

AAOOOUZy Ku 


Qcn in mo 1 

ouU I U l\U I 


MM 01^996 


SEO ID NO 1427 


Abu^uooy 


ccn m MO ^7 
ouU IU INU Of 


MM 01^4^7 


SEO ID NO 1439 


A DnQOQ71 


ccn in MO *w 
OiZU IL/ INU UO 


MM 01407ft 

INIVI \J 1 tU 1 C/ 


SEO ID NO 1449 


AdUooUUj 


ccn in MO *^ft 
OtU IU InU oO 


mm 01410Q 

INIVI \J l*+ I 


SEO ID NO 1451 


AdUooU4o 


QPO in MO 69 
OtZU IU INU Oil 


MM 014^21 

INIVI \J ItO^. 1 


SEO ID NO 1477 


ADn777/1R 

AoUof r 40 . 


oca m MO 7*\ 
OuU IL/ INU # O 


MM 014^6*3 

INIVI w ItOUw 


SEO ID NO 1480 


AbUo/ooo 


ccn in mo 

otU IU OIL/ DO 


MM 0147^0 

INIVI \J 1 *r f 


SEO ID NO 1527 

\J L.VK I L-/ 1 Nw 1 s/^ I 


A CHKOI CO 


QPO in MO 190 
OtU IU INU IZU 


NM 014754 

INIVI U I *+ / v/*T 


SEO ID NO 1528 


A CACOI 

ArUOzlbz 


QPO in MO 191 
OtU IU INU IZ I 


MM 0147Q1 

INIVI U 1 *r f C? 1 


SEO ID NO 1535 

O 1 — VX 1 L-/ 1 iV 1 \J\J\J 


ArOooOoo 


OCA 1 pi MO 1 0A 
otU IU NU TZ4 


MM 01487R 

INIVI__U I *tO f \J 


QFO ID NO 1545 


Af-o/ooiy 


ceo in MO 1^7 
oCU IU NU (Or 


MM OlAftftQ 

INIVI U IH-0057 


QFO ID NO 1S48 


Ar14oo(JO 


otu iu nu ioy 


MM 01AQ6ft 


QFO ID NO 1554 

UL.Vj< 1 LJ INU lvw*T 


A C*l CCA A 7 

Ah 1551 17 


ccn in Kin 1 70 

OtU IU NU I/O 


MM 01*vd16 
IN I VI U I 04 I D 


ccn m NO 15*SQ 

OCU IL/ INU 1 


AF161553 


ccn in Mn 1T7 
otU IU NU 1 / r 


MM ni^/117 
INIVI_U I 04 I f 


ci=o in no isfio 

OCU IL/ INU I JUU 


AF201951 


ccn m mo 1 qq 
obU IU NU Too 


IN I VI U I D404 


cpo m NO 1^R9 

OCU IL/ INU 1 JU/L 


AF257175 


CCO in MO 1 QQ 

otU iu nu ioy 


MM 01^Qft4 

inivi u i oyoT- 


QFO ID NO 1587 


A IOO/1 "7 A A 

A J 224741 


ccn in kio iQft 
obu iu nu iyo 


MM 01fi^7 
INIVI U lOOOf 


cpo ID NO 1636 

OCV»c IL/ INU IVJOu 


AKOUU/^K) 


ceo in MO 01 Q 
otu iu INvJ zny 


MM Olfi^Q 
inivi u 1 00057 


QFO ID NO 1638 


ALUOUUZl 


QPO in MO 
otU IU INvJ ZO/ 


MM 01R44ft 

INIVI U I v/*tH"0 


QFO ID NO 1645 


ai ncnnon 
ALUDUUyU 


ccn in Kin org 
otu iu nu zoy 


MM OIR^fiQ 

INIVI U 1 


SEO ID NO 1655 

UuVi( IL/ M\/ IVJv/u 


a i r\o r\r\cc\ 
ALOoOOoy 


ccn in Mn 070 
otU IU NU Z/ U 


MM 01ft*^77 
INIVI_U I DO/ 1 


QFO ID NO 16*56 

OL_W IL/ INU IUJU 


ai nor\n70 
AL0oUU/y 


ccn in Mn 07*1 
otU IU NU Z/ 1 


MM 01777Q 
IN IVI U l/r f 9 


cpo ID NO 1708 

OCW IU 1 N V-/ 1 f V/<J 


ALUoUl 1 U 


ccn in Mn 079 
otu IU INU z# z 


MM 01ft004 

INIVI U I OUUt 


SEO ID NO 1725 


ALIoobUo 


ccn in mo Qnfi 

otU IU INU OUO 


MM OlAOQft 

INIVI___U 1 OUc/O 


QFO ID NO 1739 

OUW IU INW 1 f v/v? 


A 1 A OOC4 O 

AL13obiy 


ccn in Kin on7 
otU IU INU oUf 


M^/l oifti04 

IN IVI U I O I KJH 


QFO ID NO 174*3 

OCU IU INU 1 / to 


A 1 A O 70HC 

AL1372y5 


ccn in Mn qi r 
otU IU NU O ID 


MM 01fc190 
IN IVI U I O 1 


QFO ID NO 1745 

Ot-W IU 1 Nv/ 1 1 *T\/ 


AL 137502 


ccn in ivin qor 
otU IU INU oZD 


MM OIAI^R 
INIVI U I O I OD 


ccn m NO 1748 

OCU IU INU 1 1 tO 


AL137514 


rvrA ir^ kia 007 

SEQ ID NO 327 


NIVI_U lozOO 


ccn in mo i7ftfi 

OCU IU INU I / DO 


AI -I077HQ 
ALT Of Ho 


ccn in mo *}*^fi 

OtU IU INU OOD 


MM OlA^'vd 

INIVI U 1 OOvTT 


SEO ID NO 1774 

WUVlt IU IN\«/ Iff" 


AL355708 


SEQ ID NO 353 


NM_018401 


SEQ ID NO 1782 


D25328 


SEQ ID NO 357 


NM_018410 


SEQ ID NO 1783 


L27560 


SEQ ID NO 390 


NM_0 18454 


SEQ ID NO 1786 


M21551 


SEQ ID NO 394 


NM_018455 


SEQ ID NO 1787 


NM 000017 


SEQ ID NO 416 


NM_019013 


SEQ ID NO 1809 


NM 000096 


SEQ ID NO 430 


NM 020166 


SEQ ID NO 1825 
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aMSOOCia <wo 021 03320*2 j_> 



WO 02/103320 



PCT/US02/18947 



5 



15 



25 



35 



GenBank 

Accession Number 


SEQ ID NO 


uent>ariK 

Accession Number 




NM_ 


000127 


SEQ ID NO 436 


NMJ320188 


SEQ ID NO 1830 


NM_ 


.000158 


SEQ ID NO 442 ! 


NM_020244 


SEQ ID NO 1835 


NM_ 


.000224 


SEQ ID NO 453 


NM_020386 


SEQ ID NO 1838 


NM_ 


000286 


SEQ ID NO 462 


NM_020675 


SEQ ID NO 1842 


NM_ 


000291 


SEQ ID NO 463 I 


NM_020974 


SEQ ID NO 1844 


NM_ 


000320 


SEQ ID NO 469 


R70506_RC 


SEQ ID NO 1868 


NM_ 


.000436 


SEQ ID NO 487 


U45975 


SEQ ID NO 1878 


NM_ 


.000507 


SEQ ID NO 491 


U58033 


SEQ ID NO 1881 


NM_ 


000599 


SEQ ID NO 503 


U82987 


SEQ ID NO 1891 


NM_ 


.000788 


SEQ ID NO 527 


U96131 


SEQ ID NO 1896 


NM_ 


.000849 


SEQ ID NO 530 


X05610 


SEQ ID NO 1903 


NM_ 


001007 


SEQ ID NO 550 


X94232 


SEQ ID NO 1927 


NM_ 


.001124 


SEQ ID NO 562 


Contig753_RC 


SEQ ID NO 1954 


NM_ 


.001168 


SEQ ID NO 566 


Contig1778_RC 


SEQ ID NO 1979 


NM_ 


.001216 


SEQ ID NO 574 


Contig2399_RC 


SEQ ID NO 1989 


NM_ 


.001280 


SEQ ID NO 588 


Contig2504_RC 


SEQ ID NO 1991 


NM_ 


.001282 


SEQ ID NO 589 


Contig3902_RC 


SEQ ID NO 2017 


NM_ 


.001333 


SEQ ID NO 597 


Contig4595 


SEQ ID NO 2022 


NM_ 


.001673 


SEQ ID NO 645 


Contig8581 RC 


SEQ ID NO 2037 I 


NM. 


.001809 


SEQ ID NO 673 ! 


Contig13480_RC 


SEQ ID NO 2052 


NM. 


.001827 


SEQ ID NO 676 


Contig17359_RC 


SEQ ID NO 2068 


NM_ 


.001905 


SEQ ID NO 691 


Contig20217_RC 


SEQ ID NO 2072 


NM_ 


_002019 


SEQ ID NO 711 


Contig21812_RC 


SEQ ID NO 2082 


NM. 


_002073 


SEQ ID NO 721 


Contig24252_RC 


SEQ ID NO 2087 


NM. 


_002358 


SEQ ID NO 764 


Contig25055_RC 


SEQ ID NO 2090 


NM. 


.002570 


SEQ ID NO 787 


Contig25343_RC 


SEQ ID NO 2092 


NM. 


_002808 


SEQ ID NO 822 


Contig25991 


SEQ ID NO 2098 


NM. 


_00281 1 


SEQ ID NO 823 


Contig27312_RC 


SEQ ID NO 2108 


NM. 


_002900 


SEQ ID NO 835 


Contig28552_RC 


SEQ ID NO 2120 


NM. 


J302916 


SEQ ID NO 838 


Contig32125_RC 


SEQ ID NO 2155 


NM. 


_003158 


SEQ ID NO 881 


Contig32185_RC 


SEQ ID NO 2156 


NM. 


_003234 


SEQ ID NO 891 


Contig33814_RC 


SEQ ID NO 2169 


NM. 


_003239 


SEQ ID NO 893 


Contig34634_RC 


SEQ ID NO 2180 


NM. 


_003258 


SEQ ID NO 896 


Contig35251_RC 


SEQ ID NO 2185 


NM. 


_003376 


SEQ ID NO 906 


Contig37063_RC 


SEQ ID NO 2206 


NM. 


_003600 


SEQ ID NO 929 


Contig37598 


SEQ ID NO 2216 


NM 


003607 


SEQ ID NO 930 


Contig38288_RC 


SEQ ID NO 2223 
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15 



25 



35 



GenBank 5cU lu l\U 
Accession Number 


ftonRanle CCO ID NO 

Accession Number 


NM 003662 SEQ ID NO 938 


Contig40128_RC SEQ ID NO 2248 


NM_003676 SEQ ID NO 941 


Contig40831_RC SEQ ID NO 2260 


NM 003748 SEQ ID NO 951 


Contig41413_RC SEQ ID NO 2266 


NM 003862 SEQ ID NO 960 


Contig41 887_RC SEQ ID NO 2276 


NM 003875 SEQ ID NO 962 


Contig42421_RC SEQ ID NO 2291 j 


NM 003878 SEQ ID NO 963 


Contig43747_RC SEQ ID NO 231 1 


NM 003882 SEQ ID NO 964 


Contig44064_RC SEQ ID NO 2315 


NM 003981 SEQ ID NO 977 


Contig44289_RC SEQ ID NO 2320 | 


NM 004052 SEQ ID NO 985 


Contig44799 RC SEQ ID NO 2330 


NM 004163 SEQ ID NO 995 


Contig45347_RC SEQ ID NO 2344 


NM 004336 SEQ ID NO 1022 


Contig45816_RC SEQ ID NO 2351 


NM 004358 SEQ ID NO 1026 


Contig46218_RC SEQ ID NO 2358 


NM 004456 SEQ ID NO 1043 


Contig46223_RC SEQ ID NO 2359 


NM 004480 SEQ ID NO 1046 j 


Contig46653_RC SEQ ID NO 2369 


NM 004504 SEQ ID NO 1051 


Contig46802_RC SEQ ID NO 2372 


NM 004603 SEQ ID NO 1064 


Contig47405_RC SEQ ID NO 2384 


NM 004701 SEQ ID NO 1075 


Contig48328 RC SEQ ID NO 2400 

w 


NM 004702 SEQ ID NO 1076 


Contig49670_RC SEQ ID NO 2434 


NM 004798 SEQ ID NO 1087 


Contig50106_RC SEQ ID NO 2445 


NM 004911 SEQ ID NO 1102 


Contig5041 0 SEQ ID NO 2453 


NM 004994 SEQ ID NO 1108 


Contig50802 RC SEQ ID NO 2463 


NM 005196 SEQ ID NO 1127 


Contig51 464_RC SEQ ID NO 2481 


NM 005342 SEQ ID NO 1143 


Contig51 51 9_RC SEQ ID NO 2482 


NM 005496 SEQ ID NO 1157 


Contig51 749_RC SEQ I D NO 2486 


NM 005563 SEQ ID NO 1173 


Contig51 963 SEQ ID NO 2494 


NM 005915 SEQ ID NO 1215 


Contig53226_RC SEQ ID NO 2525 


NM 006096 SEQ ID NO 1240 


Contig53268_RC SEQ ID NO 2529 


NM 006101 SEQ ID NO 1241 


Contig53646_RC SEQ ID NO 2538 


NM 006115 SEQ ID NO 1245 


Contig53742_RC SEQ ID NO 2542 


NM 006117 SEQ ID NO 1246 


Contig55188_RC SEQ ID NO 2586 


NM 006201 SEQ ID NO 1254 

fill VI \J\J\J^~\J 1 wLaM( I ■ ■ *■■» ■ 


Contig5531 3_RC SEQ I D NO 2590 


NM 006265 SEQ ID NO 1260 


Contig55377_RC SEQ ID NO 2591 


NM 006281 SEQ ID NO 1263 


Contig55725_RC SEQ ID NO 2600 


NM 006372 SEQ ID NO 1273 


Contig5581 3_RC SEQ ID NO 2607 


NM 006681 SEQ ID NO 1306 


Contig55829_RC SEQ ID NO 2608 


NM 006763 SEQ ID NO 1315 


Contig56457_RC SEQ ID NO 2630 


NM 006931 SEQ ID NO 1341 


Contig57595 SEQ ID NO 2655 
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oeriDanK ocu id nu 
Accession Number 


oenBanK otu iu nu 
Accession Number 


NM_007036 SEQ ID NO 1349 


Contig57864_RC SEQ ID NO 2663 


NM_007203 SEQ ID NO 1362 


Contig58368_RC SEQ ID NO 2668 


NM_012177 SEQ ID NO 1390 


Contig60864_RC SEQ ID NO 2676 


NM 012214 SEQ ID NO 1392 


Contig631 02_RC SEQ ID NO 2684 


NM_012261 SEQ ID NO 1397 


Contig63649_RC SEQ ID NO 2686 


NM_012429 SEQ ID NO 1413 


Contig64688 SEQ ID NO 2690 


NM 013262 SEQ ID NO 1425 
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Table 6. 70 Preferred prognosis markers drawn from Table 5. 





dentifier 


Correlation 


Sequence 
Name 


Description 




M.080059 


-0.527150 




Homo sapiens mRNA for KIAA1750 
protein, partial cds 




Contig63649 
RC 


-0.468130 




ESTs 




Contig46218 
RC 


-0.432540 




ESTs 




NM_016359 


-0.424930 


LOC51203 


clone HQ0310 PRO0310p1 




AA555029_RC 


-0.424120 




ESTs J 


10 


NM_003748 


0.420671 


ALDH4 


aldehyde dehydrogenase 4 

^yiuiarnaiG yean ii i icj-ocfi i iictiut?i iyuc 

dehydrogenase; pyrroline-5- 
carboxylate dehydrogenase) 




Contig38288 
RC 


-0.414970 




to is, vveaiviy oiiiiiiai iu lonuoo 

protein disulfide-isomerase 
[H. sapiens] 


15 


NM 003862 


0.410964 


FGF18 


fibroblast growth factor 18 




Contig28552 
RC 


-0.409260 




Homo sapiens mRNA; cDNA 
DKFZp434C0931 (from clone 
DKFZp434C0931); partial cds 




Contig32125 
RC 


0.409054 




ESTs 


20 


U82987 


0.407002 


BBC3 


Bcl-2 binding component 3 




AL137718 


-0.404980 




Homo sapiens mRNA; cDNA j 
DKFZp434C0931 (from clone 
DKFZp434C0931); partial cds 




AB037863 


0.402335 


KIAA1442 


KIAA1442 protein j 




NM 020188 


-0.400070 


DC13 


DC13 protein 


25 


NM 020974 


| 0.399987 


CEGP1 


CEGP1 protein 




NM 000127 


-0,399520 


EXT1 


exostoses ^muiiipiej i 




NM_002019 


-0.398070 


FLT1 


Tms-reiaieo tyrosine Kinase i 
(vascular endothelial growth 
factor/vascular permeability factor 
receptor) J 


30 


NM_002073 


-0.395460 


GNAZ 


guanine nucleotide binding protein 
(G protein), alpha z polypeptide 




NM 000436 


-0.392120 


OXCT 


3-oxoacid CoA transferase 




NM_004994 


-0.391690 


MMP9 


matrix metalloproteinase 9 
(gelatinase B f 92kD gelatinase, 
92kD type IV collagenase) 


35 


Contig55377 
RC 


0.390600 




ESTs 
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Contig35251 
RC 


-0.390410 




Homo sapiens cDNA: FLJ22719 fis, 
clone HSI14307 




Contig25991 


-0.390370 


ECT2 


epithelial cell transforming sequence 
2 oncogene 




NM_003875 


-0.386520 


GMPS 


guanine monphosphate synthetase 


5 


NM_006101 


-0.385890 


HEC 


highly expressed in cancer, rich in 
leucine heptad repeats 




NM_003882 


0.384479 


WISP1 


WNT1 inducible signaling pathway 
protein 1 




NM_003607 


-0.384390 


PK428 


Ser-Thr protein kinase related to the 
myotonic dystrophy protein kinase 


10 


AF073519 


-0.383340 


SERF1A 


small EDRK-rich factor 1 A 
(telomeric) 




AF052162 


-0.380830 


FLJ 12443 


hypothetical protein FLJ 12443 




NM_000849 


0.380831 


GSTM3 


glutathione S-transferase M3 (brain) 




Contig32185 
RC 


-0.379170 




Homo sapiens cDNA FLJ13997 fis, 
clone Y79AA1 002220 


15 


NM_016577 


-0.376230 


RAB6B 


RAB6B, member RAS oncogene 
family 




Contig48328 
RC 


0.375252 




ESTs, Weakly similar to T17248 
hypothetical protein 
DKFZp586G 1122.1 [H.sapiens] 




Contig46223 
RC 


0.374289 




ESTs 


20 


NM_015984 


-0.373880 


UCH37 


ubiquitin C-terminal hydrolase 
UCH37 




NM_006117 


0.373290 


PECI 


peroxisomal D3,D2-enoyI-CoA 
isomerase 




AK000745 


-0.373060 




Homo sapiens cDNA FLJ20738 fis, 
clone HEP08257 


25 


Contig40831 
RC 


-0.372930 




ESTs 




NM_003239 


0.371524 


TGFB3 


transforming growth factor, beta 3 




NM_0 14791 


-0.370860 


l/l a A r\ A ~T T~ 

KIAA0175 


KIAA0175 gene product 




X05610 


-0.370860 


COL4A2 


collagen, type IV, alpha 2 




NM_016448 


-0.369420 


L2DTL 


L2DTL protein 


30 


NM_018401 


0.368349 


HSA250839 


gene for serine/threonine protein 
kinase 




NM_000788 


-0.367700 


DCK 


deoxycytidine kinase 




Contig51464 
RC 


-0.367450 


FLJ22477 


hypothetical protein FLJ22477 


35 


AL080079 


-0.367390 


DKFZP564D 
0462 


hypothetical protein 
DKFZp564D0462 
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NM_006931 


-0.366490 


SLC2A3 


solute carrier family 2 (facilitated 
glucose transporter), member 3 




^F257175 


0.365900 




Homo sapiens hepatocellular i 
carcinoma-associated antigen 64 
(HCA64) mRNA, complete cds j 


5 


NM_014321 


-0.365810 


ORC6L 


origin recognition complex, subunit 6 
(yeast homolog)-like 




NM_002916 


-0.365590 


RFC4 


replication factor C (activator 1)4 
(37kD) 




Contig55725 
RC 


-0.365350 




ESTs, Moderately similar to T50635 
hypothetical protein 
DKFZp762L0311.1 [H.sapiens] 


10 


Contig24252 
RC 


-0.364990 




ESTs 




AF201951 


0.363953 


CFFM4 


high affinity immunoglobulin epsilon 
receptor beta subunit 




NM_005915 


-0.363850 


MCM6 


minichromosome maintenance 
deficient (mis5, S. pombe) 6 


15 


NM_001282 


0.363326 


AP2B1 


adaptor-related protein complex 2, 
beta 1 subunit 




Contig56457 
RC 


-0.361650 


TMEFF1 


transmembrane protein with EGF- 
like and two follistatin-iiKe domains i 




NM_000599 


-0.361290 


IGFBP5 


insulin-like growth factor binding 
protein 5 




NM 020386 


-0.360780 


LOC57110 


H-REV107 protein-related protein 


20 


NM 014889 


-0.360040 


MP1 


metalloprotease 1 (pitrilysin family) 




AF055033 


-0.359940 


IGFBP5 


insulin-like growth factor binding 
protein 5 




NM 006681 


-0.359700 


NMU 


neuromedin U 




NM 007203 


-0.359570 


AKAP2 


A kinase (PRKA) anchor protein 2 


25 


Contig63102 
RC 


0.359255 


FLJ11354 


hypothetical protein FLJ11354 




NM 003981 


-0.358260 


PRC1 


protein regulator of cytokinesis 1 




Contig20217 
RC 


-0.357880 




ESTs 




NM 001809 


-0.357720 


CENPA 


centromere protein A (1 7ku) 


30 


Contig2399 R 
C 


-0.3000UU 


cm on 


cimilor tn rat cmnnth mn^plf^ nrnfpin 
oimiial \\J Idl ollHJLHii iiiuov*H5 piuiciii 

SM-20 




NM_004702 


-0.356600 


CCNE2 


cyclin E2 




NM 007036 


-0.356540 


ESM1 


endothelial cell-specific molecule 1 




NM 018354 


-0.356000 


FLJ11190 


hypothetical protein FLJ11190 
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The sets of markers listed in Tables 1-6 partially overlap; in other words, 
some markers are present in multiple sets, while other markers are unique to a set (FIG. 1). 
Thus, in one embodiment, the invention provides a set of 256 genetic markers that can 
distinguish between ER(+) and ER(-), and also between BRCA1 tumors and sporadic tumors 

5 {i.e., classify a tumor as ER(-) or ER(-) and 2?i?G4i-related or sporadic). In a more specific 
embodiment, the invention provides subsets of at least 20, at least 50, at least 100, or at least 
150 of the set of 256 markers, that can classify a tumor as ER(-) or ER(-) and BRCA1- 
related or sporadic. In another embodiment, the invention provides 165 markers that can 
distinguish between ER(+) and ER(-), and also between patients with good versus poor 

10 prognosis (i.e., classify a tumor as either ER(-) or ER(+) and as having been removed from 
a patient with a good prognosis or a poor prognosis). In a more specific embodiment, the 
invention further provides subsets of at least 20, 50, 100 or 125 of the full set of 165 
markers, which also classify a tumor as either ER(-) or ER(+) and as having been removed 
from a patient with a good prognosis or a poor prognosis The invention further provides a 

15 set of twelve markers that can distinguish between BRCA1 tumors and sporadic tumors, and 
between patients with good versus poor prognosis. Finally, the invention provides eleven 
markers capable of differentiating all three statuses. Conversely, the invention provides 
2,050 of the 2,460 ER-status markers that can determine only ER status, 173 of the 430 
BRCA1 v. sporadic markers that can determine only BRCA1 v. sporadic status, and 65 of the 

20 23 1 prognosis markers that can only determine prognosis. In more specific embodiments, 
the invention also provides for subsets of at least 20, 50, 100, 200, 500, 1,000, 1,500 or 
2,000 of the 2,050 ER-status markers that also determine only ER status. The invention 
also provides subsets of at least 20, 50, 100 or 150 of the 173 markers that also determine 
only BRCA1 v. sporadic status. The invention further provides subsets of at least 20, 30, 40, 

25 or 50 of the 65 prognostic markers that also determine only prognostic status. 

Any of the sets of markers provided above may be used alone specifically or 
in combination with markers outside the set. For example, markers that distinguish ER- 
status may be used in combination with the BRCA1 vs. sporadic markers, or with the 
prognostic markers, or both. Any of the marker sets provided above may also be used in 

30 combination with other markers for breast cancer, or for any other clinical or physiological 
condition. 

The relationship between the marker sets is diagramed in FIG. 1 . 



35 
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5.3.2 IDENTIFICATION OF MARKERS 
The present invention provides sets of markers for the identification of 
conditions or indications associated with breast cancer. Generally, the marker sets were 
identified by determining which of -25,000 human markers had expression patters that 

5 correlated with the conditions or indications. 

In one embodiment, the method for identifying marker sets is as follows. 
After extraction and labeling of target polynucleotides, the expression of all markers (genes) 
in a sample X is compared to the expression of all markers in a standard or control. In one 
embodiment, the standard or control comprises target polynucleotide molecules derived 

1 0 from a sample from a normal individual (i.e., an individual not afflicted with breast cancer). 
In a preferred embodiment, the standard or control is a pool of target polynucleotide 
molecules. The pool may derived from collected samples from a number of normal 
individuals. In a preferred embodiment, the pool comprises samples taken from a number 
of individuals having sporadic-type tumors. In another preferred embodiment, the pool 

15 comprises an artificially-generated population of nucleic acids designed to approximate the 
level of nucleic acid derived from each marker found in a pool of marker-derived nucleic- 
acids derived from tumor samples. In yet another embodiment, the pool is derived from 
normal or breast cancer cell lines or cell line samples. 

The comparison may be accomplished by any means known in the art. For 

20 example, expression levels of various markers may be assessed by separation of target 
polynucleotide molecules (e.#., RNA or cDNA) derived from the markers in agarose or 
polyacrylamide gels, followed by hybridization with marker-specific oligonucleotide 
probes. Alternatively, the comparison may be accomplished by the labeling of target 
polynucleotide molecules followed by separation on a sequencing gel. Polynucleotide 

25 samples are placed on the gel such that patient and control or standard polynucleotides are 
in adjacent lanes. Comparison of expression levels is accomplished visually or by means of 
densitometer. In a preferred embodiment, the expression of all markers is assessed 
simultaneously by hybridization to a microarray. In each approach, markers meeting certain 
criteria are identified as associated with breast cancer. 

30 A marker is selected based upon significant difference of expression in a 

sample as compared to a standard or control condition. Selection may be made based upon 
either significant up- or down regulation of the marker in the patient sample. Selection may 
also be made by calculation of the statistical significance (i.e., the p-value) of the correlation 
between the expression of the marker and the condition or indication. Preferably, both 

35 selection criteria are used. Thus, in one embodiment of the present invention, markers 
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associated with breast cancer are selected where the markers show both more than two-fold 
change (increase or decrease) in expression as compared to a standard, and the p-value for 
the correlation between the existence of breast cancer and the change in marker expression 
is no more than 0.01 (i.e., is statistically significant). 

5 The expression of the identified breast cancer-related markers is then used to 

identify markers that can differentiate tumors into clinical types. In a specific embodiment 
using a number of tumor samples, markers are identified by calculation of correlation 
coefficients between the clinical category or clinical parameter(s) and the linear, logarithmic 
or any transform of the expression ratio across all samples for each individual gene. 

1 0 Specifically, the correlation coefficient is calculated as 

where C represents the clinical parameters or categories and T represents the linear, 

j 5 logarithmic or any transform of the ratio of expression between sample and control. 
Markers for which the coefficient of correlation exceeds a cutoff are identified as breast 
cancer-related markers specific for a particular clinical type. Such a cutoff or threshold 
corresponds to a certain significance of discriminating genes obtained by Monte Carlo 
simulations. The threshold depends upon the number of samples used; the threshold can be 

20 calculated as 3 X l/^n-3, where \/yJn^3 is the distribution width and n = the number of 
samples. In a specific embodiment, markers are chosen if the correlation coefficient is 
greater than about 0.3 or less than about -0.3. 

Next, the significance of the correlation is calculated. This significance may 
be calculated by any statistical means by which such significance is calculated. In a specific 

25 example, a set of correlation data is generated using a Monte-Carlo technique to randomize 
the association between the expression difference of a particular marker and the clinical 
category. The frequency distribution of markers satisfying the criteria through calculation 
of correlation coefficients is compared to the number of markers satisfying the criteria in the 
data generated through the Monte-Carlo technique. The frequency distribution of markers 

3Q satisfying the criteria in the Monte-Carlo runs is used to determine whether the number of 
markers selected by correlation with clinical data is significant. See Example 4. 

Once a marker set is identified, the markers may be rank-ordered in order of 
significance of discrimination. One means of rank ordering is by the amplitude of 
correlation between the change in gene expression of the marker and the specific condition 

35 being discriminated. Another, preferred means is to use a statistical metric. In a specific 
embodiment, the metric is a Fisher-like statistic: 
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/VK (*i -l) + *f(»2 - W(»i +«2 -l)/(V^i Equation (3) 



5 In this equation, is the error-weighted average of the log ratio of transcript expression 

measurements within a first diagnostic group (e.g., ER(-) 3 (^} is the error-weighted average 

of log ratio within a second, related diagnostic group (e.g., ER(+)), G x is the variance of the 

1 0 log ratio within the ER(-) group and tt, is the number of samples for which valid 

measurements of log ratios are available. G 2 is the variance of log ratio within the second 
diagnostic group (e.g., ER(+ )), and n 2 is the number of samples for which valid 
measurements of log ratios are available. The /-value represents the variance-compensated 

j 5 difference between two means. 

The rank-ordered marker set may be used to optimize the number of markers 
in the set used for discrimination. This is accomplished generally in a "leave one ouf ' 
method as follows. In a first run, a subset, for example 5, of the markers from the top of the 
ranked list is used to generate a template, where out of X samples, X-l are used to generate 

20 the template, and the status of the remaining sample is predicted. This process is repeated 
for every sample until every one of the X samples is predicted once. In a second run, 
additional markers, for example 5, are added, so that a template is now generated from 10 
markers, and the outcome of the remaining sample is predicted. This process is repeated 
until the entire set of markers is used to generate the template. For each of the runs, type 1 

25 error (false negative) and type 2 errors (false positive) are counted; the optimal number of 
markers is that number where the type 1 error rate, or type 2 error rate, or preferably the 
total of type 1 and type 2 error rate is lowest. 

For prognostic markers, validation of the marker set may be accomplished by 
an additional statistic, a survival model. This statistic generates the probability of tumor 

30 distant metastases as a function of time since initial diagnosis. A number of models may be 
used, including Weibull, normal, log-normal, log logistic, log-exponential, or log-Rayleigh 
(Chapter 12 "Life Testing", S-PLUS 2000 GUIDE TO STATISTICS, Vol. 2, p. 368 (2000)). 
For the "normal" model, the probability of distant metastases P at time t is calculated as 



35 P = a x exp 



C^/T 2 ) Equation (4) 
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where a is fixed and equal to 1, and r is a parameter to be fitted and measures the 

"expected lifetime". 

It will be apparent to those skilled in the art that the above methods, in 
particular the statistical methods, described above, are not limited to the identification of 
markers associated with breast cancer, but may be used to identify set of marker genes 
associated with any phenotype. The phenotype can be the presence or absence of a disease 
such as cancer, or the presence or absence of any identifying clinical condition associated 
with that cancer. In the disease context, the phenotype may be a prognosis such as a 
survival time, probability of distant metastases of a disease condition, or likelihood of a 
particular response to a therapeutic or prophylactic regimen. The phenotype need not be 
cancer, or a disease; the phenotype may be a nominal characteristic associated with a 
healthy individual. 



20 



53.3 SAMPLE COLLECTION 
* 5 In the present invention, target polynucleotide molecules are extracted from a 

sample taken from an individual afflicted with breast cancer. The sample may be collected 
in any clinically acceptable maimer, but must be collected such that marker-derived 
polynucleotides (te. 9 RNA) are preserved. mRNA or nucleic acids derived therefrom (ie. 9 
cDNA or amplified DNA) are preferably labeled distinguishably from standard or control 
polynucleotide molecules, and both are simultaneously or independently hybridized to a 
microarray comprising some or all of the markers or marker sets or subsets described above. 
Alternatively, mRNA or nucleic acids derived therefrom may be labeled with the same label 
as the standard or control polynucleotide molecules, wherein the intensity of hybridization 
of each at a particular probe is compared. A sample may comprise any clinically relevant 
^ tissue sample, such as a tumor biopsy or fine needle aspirate, or a sample of bodily fluid, 
such as blood, plasma, serum, lymph, ascitic fluid, cystic fluid, urine or nipple exudate. The 
sample may be taken from a human, or, in a veterinary context, from non-human animals 
such as ruminants, horses, swine or sheep, or from domestic companion animals such as 
felines and canines. 

30 Methods for preparing total and poly(A)+ RNA are well known and are 

described generally in Sambrook et al 9 MOLECULAR CLONING - A LABORATORY MANUAL 
(2ND Ed.), Vols. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 
(1989)) and Ausubel et al. 9 CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, vol. 2, Current 
Protocols Publishing, New York (1994)). 

35 
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RNA may be isolated from eukaryotic cells by procedures that involve lysis 
of the cells and denaturation of the proteins contained therein. Cells of interest include 
wild-type cells (z.e., non-cancerous), drug-exposed wild-type cells, tumor- or tumor-derived 
cells, modified cells, normal or tumor cell line cells, and drug-exposed modified cells. 

5 Additional steps may be employed to remove DNA. Cell lysis may be 

accomplished with a nonionic detergent, followed by microcentrifugation to remove the 
nuclei and hence the bulk of the cellular DNA. In one embodiment, RNA is extracted from 
cells of the various types of interest using guanidinium thiocyanate lysis followed by CsCl 
centrifiigation to separate the RNA from DNA (Chirgwin et a/., Biochemistry 18:5294-5299 

10 (1979)). Poly(A)+ RNA is selected by selection with oligo-dT cellulose {see Sambrook et 
al , , Molecular Cloning - A Laboratory Manual (2nd Ed.), Vols. 1-3, Cold Spring 
Harbor Laboratory, Cold Spring Harbor, New York (1989). Alternatively, separation of 
RNA from DNA can be accomplished by organic extraction, for example, with hot phenol 
or phenol/chloroform/isoamyl alcohol. 

15 If desired, RNase inhibitors may be added to the lysis buffer. Likewise, for 

certain cell types, it may be desirable to add a protein denaturation/digestion step to the 
protocol. 

For many applications, it is desirable to preferentially enrich mRNA with 
respect to other cellular RNAs, such as transfer RNA (tRNA) and ribosomal RNA (rRNA). 

20 Most mRNAs contain a poly(A) tail at their 3 f end. This allows them to be enriched by 
affinity chromatography, for example, using oligo(dT) or poly(U) coupled to a solid 
support, such as cellulose or Sephadex™ (see Ausubel et aL, CURRENT PROTOCOLS IN 
Molecular Biology, vol. 2, Current Protocols Publishing, New York (1994). Once 
bound, poly(A)+ mRNA is eluted from the affinity column using 2 mM EDTA/0. 1% SDS. 

25 The sample of RNA can comprise a plurality of different mRNA molecules, 

each different mRNA molecule having a different nucleotide sequence. In a specific 
embodiment, the mRNA molecules in the RNA sample comprise at least 100 different 
nucleotide sequences. More preferably, the mRNA molecules of the RNA sample comprise 
mRNA molecules corresponding to each of the marker genes. In another specific 

30 embodiment, the RNA sample is a mammalian RNA sample. 

hi a specific embodiment, total RNA or mRNA from cells are used in the 
methods of the invention. The source of the RNA can be cells of a plant or animal, human, 
mammal, primate, non-human animal, dog, cat, mouse, rat, bird, yeast, eukaryote, 
prokaryote, etc. In specific embodiments, the method of the invention is used with a sample 

35 containing total mRNA or total RNA from 1 x 10 6 cells or less. In another embodiment, 
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proteins can be isolated from the foregoing sources, by methods known in the art, for use in 
expression analysis at the protein level. 

Probes to the homologs of the marker sequences disclosed herein can be 
employed preferably wherein non-human nucleic acid is being assayed. 

5 

5.4 MET HODS OF USING BREAST CANCER MARKER SETS 
5.4.1 DIAGNOSTIC METHODS 
The present invention provides for methods of using the marker sets to 
analyze a sample from an individual so as to determine the individual's tumor type or 

10 subtype at a molecular level, whether a tumor is of the ER(+) or ER(-) type, and whether the 
tumor is 5ifC4/-associated or sporadic. The individual need not actually be afflicted with 
breast cancer. Essentially, the expression of specific marker genes in the individual, or a 
sample taken therefrom, is compared to a standard or control. For example, assume two 
breast cancer-related conditions, X and Y. One can compare the level of expression of 

15 breast cancer prognostic markers for condition X in an individual to the level of the marker- 
derived polynucleotides in a control, wherein the level represents the level of expression 
exhibited by samples having condition X. In this instance, if the expression of the markers 
in the individual's sample is substantially (i.e., statistically) different from that of the 
control, then the individual does not have condition X. Where, as here, the choice is 

20 bimodal (i.e., a sample is either X or Y), the individual can additionally be said to have 
condition Y. Of course, the comparison to a control representing condition Y can also be 
performed. Preferably both are performed simultaneously, such that each control acts as 
both a positive and a negative control. The distinguishing result may thus either be a 
demonstrable difference from the expression levels (i.e., the amount of marker-derived 

25 RNA, or polynucleotides derived therefrom) represented by the control, or no significant 
difference. 

Thus, in one embodiment, the method of determining a particular tumor- 
related status of an individual comprises the steps of (1) hybridizing labeled target 
polynucleotides from an individual to a microarray containing one of the above marker sets; 

30 (2) hybridizing standard or control polynucleotides molecules to the microarray, wherein the 
standard or control molecules are differentially labeled from the target molecules; and (3) 
determining the difference in transcript levels, or lack thereof, between the target and 
standard or control, wherein the difference, or lack thereof, determines the individual's 
tumor-related status. In a more specific embodiment, the standard or control molecules 

35 comprise marker-derived polynucleotides from a pool of samples from normal individuals, 
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or a pool of tumor samples from individuals having sporadic-type tumors. In a preferred 
embodiment, the standard or control is an artificially-generated pool of marker-derived 
polynucleotides, which pool is designed to mimic the level of marker expression exhibited 
by clinical samples of normal or breast cancer tumor tissue having a particular clinical 

5 indication (i.e., cancerous or non-cancerous; ER(+) or ER(-) tumor; BRCA1- or sporadic 
type tumor). In another specific embodiment, the control molecules comprise a pool 
derived from normal or breast cancer cell lines. 

The present invention provides sets of markers useful for distinguishing 
ER(+) from ER(-) tumor types. Thus, in one embodiment of the above method, the level of 

10 polynucleotides {i.e., mRNA or polynucleotides derived therefrom) in a sample from an 
individual, expressed from the markers provided in Table 1 are compared to the level of 
expression of the same markers from a control, wherein the control comprises marker- 
related polynucleotides derived from ER(+) samples, ER(-) samples, or both. Preferably, 
the comparison is to both ER(+) and ER(-), and preferably the comparison is to 

1 5 polynucleotide pools from a number of ER(+) and ER(-) samples, respectively. Where the 
individual's marker expression most closely resembles or correlates with the ER(+) control, 
and does not resemble or correlate with the ER(-) control, the individual is classified as 
ER(+). Where the pool is not pure ER(+) or ER(-), for example, a sporadic pool is used. A 
set of experiments using individuals with known ER status should be hybridized against the 

20 pool, in order to define the expression templates for the ER(+) and ER(-) group. Each 
individual with unknown ER status is hybridized against the same pool and the expression 
profile is compared to the templates (s) to determine the individual's ER status. 

The present invention provides sets of markers useful for chstinguishing 
BRCA1 -related tumors from sporadic tumors. Thus, the method can be performed 

25 substantially as for the ER(+/-) determination, with the exception that the markers are those 
listed in Tables 3 and 4, and the control markers are a pool of marker-derived 
polynucleotides BRCA1 tumor samples, and a pool of marker-derived polynucleotides from 
sporadic tumors. A patient is determined to have a BRCA1 germline mutation where the 
expression of the individual's marker-derived polynucleotides most closely resemble, or are 

30 most closely correlated with, that of the BRCA1 control. Where the control is not pure 
BRCA1 or sporadic, two templates can be defined in a maimer similar to that for ER status, 
as described above. 

For the above two embodiments of the method, the frill set of markers may 
be used (i.e., the complete set of markers for Tables 1 or 3). hi other embodiments, subsets 
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of the markers may be used. In a preferred embodiment, the preferred markers listed in 
Tables 2 or 4 are used. 

The similarity between the marker expression profile of an individual and 
that of a control can be assessed a number of ways. In the simplest case, the profiles can be 
5 compared visually in a printout of expression difference data. Alternatively, the similarity 
can be calculated mathematically. 

In one embodiment, the similarity measure between two patients x and y, or 
patient x and a template y, can be calculated using the following equation: 



10 



s=i- 



It ^ 



cr 

K yt J 



Equation (5) 



15 



In this equation, X andjy are two patients with components of log ratio ^ and 
i=\„.,N = 4,986. Associated with every value X i is error Oi . The smaller the value <X , 



20 



the more reliable the measurement X t . x — / / / ~T is the error-weighted arithmetic 



25 



mean. 

In a preferred embodiment, templates are developed for sample comparison. 
The template is defined as the error-weighted log ratio average of the expression difference 
for the group of marker genes able to differentiate the particular breast cancer-related 
condition. For example, templates are defined for ER(+) samples and for ER(-) samples. 
Next, a classifier parameter is calculated. This parameter may be calculated using either 
expression level differences between the sample and template, or by calculation of a 
correlation coefficient. Such a coefficient, P i9 can be calculated using the following 
2q equation: 

^=fe # 5 5 )/|^|-H) Equation (1) 

where Z i is the expression template z, and J/ is the expression profile of a patient. 

Thus, in a more specific embodiment, the above method of detennining a 
35 particular tumor-related status of an individual comprises the steps of (1) hybridizing 



-100 



BNSDOCID: <WO_ 



_02103320A2J_> 



WO 02/103320 



PCT/US02/18947 



labeled target polynucleotides from an individual to a microarray containing one of the 
above marker sets; (2) hybridizing standard or control polynucleotides molecules to the 
microarray, wherein the standard or control molecules are differentially labeled from the 
target molecules; and (3) determining the ratio (or difference) of transcript levels between 
5 two channels (individual and control), or simply the transcript levels of the individual; and 
(4) comparing the results from (3) to the predefined templates, wherein said determining is 
accomplished by means of the statistic of Equation 1 or Equation 5, and wherein the 
difference, or lack thereof, determines the individual's tumor-related status. 

10 5.4.2 PROGNOSTIC METHODS 

The present invention provides sets of markers usefiil for distinguishing 
samples from those patients with a good prognosis from samples from patients with a poor 
prognosis. Thus, the invention further provides a method for using these markers to 
determine whether an individual afflicted with breast cancer will have a good or poor 

1 5 clinical prognosis. In one embodiment, the invention provides for method of determining 
whether an individual afflicted with breast cancer will likely experience a relapse within 
five years of initial diagnosis (i.e. y whether an individual has a poor prognosis) comprising 

(1) comparing the level of expression of the markers listed in Table 5 in a sample taken 
from the individual to the level of the same markers in a standard or control, where the 

20 standard or control levels represent those found in an individual with a poor prognosis; and 

(2) determining whether the level of the marker-related polynucleotides in the sample from 
the individual is significantly different than that of the control, wherein if no substantial 
difference is found, the patient has a poor prognosis, and if a substantial difference is found, 
the patient has a good prognosis. Persons of skill in the art will readily see that the markers 

25 associated with good prognosis can also be used as controls. In a more specific 

embodiment, both controls are run. In case the pool is not pure 'good prognosis' or 'poor 
prognosis', a set of experiments of individuals with known outcome should be hybridized 
against the pool to define the expression templates for the good prognosis and poor 
prognosis group. Each individual with unknown outcome is hybridized against the same 

30 pool and the resulting expression profile is compared to the templates to predict its 
outcome. 

Poor prognosis of breast cancer may indicate that a tumor is relatively 
aggressive, while good prognosis may indicate that a tumor is relatively nonaggressive. 
Therefore, the invention provides for a method of determining a course of treatment of a 
35 breast cancer patient, comprising deter mining whether the level of expression of the 23 1 
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markers of Table 5, or a subset thereof correlates with the level of these markers in a 
sample representing a good prognosis expression pattern or a poor prognosis pattern; and 
determining a course of treatment, wherein if the expression correlates with the poor 
prognosis pattern, the tumor is treated as an aggressive tumor. 

5 As with the diagnostic markers, the method can use the complete set of 

markers listed in Table 5. However, subsets of the markers may also be used. In a preferred 
embodiment, the subset listed in Table 6 is used. 

Classification of a sample as "good prognosis" or "poor prognosis" is 
accomplished substantially as for the diagnostic markers described above, wherein a 

10 template is generated to which the marker expression levels in the sample are compared. 

The use of marker sets is not restricted to the prognosis of breast cancer- 
related conditions, and may be applied in a variety of phenotypes or conditions, clinical or 
experimental, in which gene expression plays a role. Where a set of markers has been 
identified that corresponds to two or more phenotypes, the marker sets can be used to 

15 distinguish these phenotypes. For example, the phenotypes may be the diagnosis and/or 
prognosis of clinical states or phenotypes associated with other cancers, other disease 
conditions, or other physiological conditions, wherein the expression level data is derived 
from a set of genes correlated with the particular physiological or disease condition. 

20 5.4.3 IMPROVING SENSITIVITY TO EXPRESSION LEVEL DIFFERENCES 

In using the markers disclosed herein, and, indeed, using any sets of markers 
to differentiate an individual having one phenotype from another individual having a second 
phenotype, one can compare the absolute expression of each of the markers in a sample to a 
control; for example, the control can be the average level of expression of each of the 

25 markers, respectively, in a pool of individuals. To increase the sensitivity of the 

comparison, however, the expression level values are preferably transformed in a number of 
ways. 

For example, the expression level of each of the markers can be normalized 
by the average expression level of all markers the expression level of which is determined, 

30 or by the average expression level of a set of control genes. Thus, in one embodiment, the 
markers are represented by probes on a microarray, and the expression level of each of the 
markers is normalized by the mean or median expression level across all of the genes 
represented on the microarray, including any non-marker genes. In a specific embodiment, 
the normalization is carried out by dividing the median or mean level of expression of all of 

35 the genes on the microarray. In another embodiment, the expression levels of the markers is 
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normalized by the mean or median level of expression of a set of control markers. In a 
specific embodiment, the control markers comprise a set of housekeeping genes. In another 
specific embodiment, the normalization is accomplished by dividing by the median or mean 
expression level of the control genes. 

5 The sensitivity of a marker-based assay will also be increased if the 

expression levels of individual markers are compared to the expression of the same markers 
in a pool of samples. Preferably, the comparison is to the mean or median expression level 
of each the marker genes in the pool of samples. Such a comparison may be accomplished, 
for example, by dividing by the mean or median expression level of the pool for each of the 

10 markers from the expression level each of the markers in the sample. This has the effect of 
accentuating the relative differences in expression between markers in the sample and 
markers in the pool as a whole, making comparisons more sensitive and more likely to 
produce meaningful results that the use of absolute expression levels alone. The expression 
level data may be transformed in any convenient way; preferably, the expression level data 

1 5 for all is log transformed before means or medians are taken. 

In performing comparisons to a pool, two approaches may be used. First, the 
expression levels of the markers in the sample may be compared to the expression level of 
those markers in the pool, where nucleic acid derived from the sample and nucleic acid 
derived from the pool are hybridized during the course of a single experiment. Such an 

20 approach requires that new pool nucleic acid be generated for each comparison or limited 
numbers of comparisons, and is therefore limited by the amount of nucleic acid available. 
Alternatively, and preferably, the expression levels in a pool, whether normalized and/or 
transformed or not, are stored on a computer, or on computer-readable media, to be used in 
comparisons to the individual expression level data from the sample {i.e., single-channel 

25 data). 

Thus, the current invention provides the following method of classifying a 
first cell or organism as having one of at least two different phenotypes, where the different 
phenotypes comprise a first phenotype and a second phenotype. The level of expression of 
each of a plurality of genes in a first sample from the first cell or organism is compared to 

30 the level of expression of each of said genes, respectively, in a pooled sample from a 

plurality of cells or organisms, the plurality of cells or organisms comprising different cells 
or organisms exhibiting said at least two different phenotypes, respectively, to produce a 
first compared value. The first compared value is then compared to a second compared 
value, wherein said second compared value is the product of a method comprising 

35 comparing the level of expression of each of said genes in a sample from a cell or organism 
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characterized as having said first phenotype to the level of expression of each of said genes, 
respectively, in the pooled sample. The first compared value is then compared to a third 
compared value, wherein said third compared value is the product of a method comprising 
comparing the level of expression of each of the genes in a sample from a cell or organism 

5 characterized as having the second phenotype to the level of expression of each of the 
genes, respectively, in the pooled sample. Optionally, the first compared value can be 
compared to additional compared values, respectively, where each additional compared 
value is the product of a method comprising comparing the level of expression of each of 
said genes in a sample from a cell or organism characterized as having a phenotype different 

10 from said first and second phenotypes but included among the at least two different 
phenotypes, to the level of expression of each of said genes, respectively, in said pooled 
sample. Finally, a determination is made as to which of said second, third, and, if present, 
one or more additional compared values, said first compared value is most similar, wherein 
the first cell or organism is determined to have the phenotype of the cell or organism used to 

15 produce said compared value most similar to said first compared value. 

In a specific embodiment of this method, the compared values are each ratios 
of the levels of expression of each of said genes. In another specific embodiment, each of 
the levels of expression of each of the genes in the pooled sample are normalized prior to 
any of the comparing steps. In a more specific embodiment, the normalization of the levels 

20 of expression is carried out by dividing by the median or mean level of the expression of 
each of the genes or dividing by the mean or median level of expression of one or more 
housekeeping genes in the pooled sample from said cell or organism. In another specific 
embodiment, the normalized levels of expression are subjected to a log transform, and the 
comparing steps comprise subtracting the log transform from the log of the levels of 

25 expression of each of the genes in the sample. In another specific embodiment, the two or 
more different phenotypes are different stages of a disease or disorder. In still another 
specific embodiment, the two or more different phenotypes are different prognoses of a 
disease or disorder. In yet another specific embodiment, the levels of expression of each of 
the genes, respectively, in the pooled sample or said levels of expression of each of said 

30 genes in a sample from the cell or organism characterized as having the first phenotype, 
second phenotype, or said phenotype different from said first and second phenotypes, 
respectively, are stored on a computer or on a computer-readable medium. 

In another specific embodiment, the two phenotypes are ER(+) or ER(-) 
status. In another specific embodiment, the two phenotypes are BRCA1 or sporadic tumor- 

35 
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type status. Ia yet another specific embodiment, the two phenotypes are good prognosis and 
poor prognosis. 

Of course, single-channel data may also be used without specific comparison 
to a mathematical sample pool. For example, a sample may be classified as having a first or 

5 a second phenotype, wherein the first and second phenotypes are related, by calculating the 
similarity between the expression of at least 5 markers in the sample, where the markers are 
correlated with the first or second phenotype, to the expression of the same markers in a 
first phenotype template and a second phenotype template, by (a) labeling nucleic acids 
derived from a sample with a fluorophore to obtain a pool of fluorophore-labeled nucleic 

1 0 acids; (b) contacting said fluorophore-labeled nucleic acid with a microairay under 

conditions such that hybridization can occur, detecting at each of a plurality of discrete loci 
on the microarray a flourescent emission signal from said fluorophore-labeled nucleic acid 
that is bound to said microarray under said conditions; and (c) determining the similarity of 
marker gene expression in the individual sample to the first and second templates, wherein 

15 if said expression is more similar to the first template, the sample is classified as having the 
first phenotype, and if said expression is more similar to the second template, the sample is 
classified as having the second phenotype. 

5.5 DETERMINATION OF MARKER GENE EXPRESSION LEVELS 

20 5.5.1 METHODS 

The expression levels of the marker genes in a sample may be determined by 
any means known in the art. The expression level may be determined by isolating and 
determining the level (i.e., amount) of nucleic acid transcribed from each marker gene. 
Alternatively, or additionally, the level of specific proteins translated from mRNA 

25 transcribed from a marker gene may be determined. 

The level of expression of specific marker genes can be accomplished by 
determining the amount of mRNA, or polynucleotides derived therefrom, present in a 
sample. Any method for determining RNA levels can be used. For example, RNA is 
isolated from a sample and separated on an agarose gel. The separated RNA is then 

30 transferred to a solid support, such as a filter. Nucleic acid probes representing one or more 
markers are then hybridized to the filter by northern hybridization, and the amount of 
marker-derived RNA is determined. Such determination can be visual, or machine-aided, 
for example, by use of a densitometer. Another method of dete rminin g RNA levels is by 
use of a dot-blot or a slot-blot. In this method, RNA, or nucleic acid derived therefrom, 

35 from a sample is labeled. The RNA or nucleic acid derived therefrom is thai hybridized to 
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a filter containing oligonucleotides derived from one or more marker genes, wherein the 
oligonucleotides are placed upon the filter at discrete, easily-identifiable locations. 
Hybridization, or lack thereof, of the labeled RNA to the filter-bound oligonucleotides is 
determined visually or by densitometer. Polynucleotides can be labeled using a radiolabel 

5 or a fluorescent 0*. e. , visible) label. 

These examples are not intended to be limiting; other methods of 
determining RNA abundance are known in the art. 

The level of expression of particular marker genes may also be assessed by 
determining the level of the specific protein expressed from the marker genes. This can be 

10 accomplished, for example, by separation of proteins from a sample on a polyacrylamide 
gel, followed by identification of specific marker-derived proteins using antibodies in a 
western blot. Alternatively, proteins can be separated by two-dimensional gel 
electrophoresis systems. Two-dimensional gel electrophoresis is well-known in the art and 
typically involves isoelectric focusing along a first dimension followed by SDS-PAGE 

15 electrophoresis along a second dimension. See, e.g., Hames et al, 1990, Gel 
Electrophoresis of Proteins: A Practical Approach, IRL Press, New York; 
Shevchenko et al, Proc. Nat'lAcad. Sci. USA 93:1440-1445 (1996); Sagliocco et al, Yeast 
12:1519-1533 (1996); Lander, Science 274:536-539 (1996). The resulting 
electropherograms can be analyzed by numerous techniques, including mass spectrometric 

20 techniques, western blotting and immunoblot analysis using polyclonal and monoclonal 
antibodies. 

Alternatively, marker-derived protein levels can be determined by 
constructing an antibody micro array in which binding sites comprise immobilized, 
preferably monoclonal, antibodies specific to a plurality of protein species encoded by the 

25 cell genome. Preferably, antibodies are present for a substantial fraction of the marker- 
derived proteins of interest. Methods for making monoclonal antibodies are well known 
{see, e.g., Harlow and Lane, 1988, Antibodies: A Laboratory Manual, Cold Spring 
Harbor, New York, which is incorporated in its entirety for all purposes). In one 
embodiment, monoclonal antibodies are raised against synthetic peptide fragments designed 

30 based on genomic sequence of the cell. With such an antibody array, proteins from the cell 
are contacted to the array, and their binding is assayed with assays known in the art. 
Generally, the expression, and the level of expression, of proteins of diagnostic or 
prognostic interest can be detected through immunohistochemical staining of tissue slices or 
sections. 
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Finally, expression of marker genes in a number of tissue specimens may be 
characterized using a "tissue array" (Kononen et al. 9 Nat. Med 4(7):844-7 (1998)). In a 
tissue array, multiple tissue samples are assessed on the same microarray. The arrays allow 
in situ detection of RNA and protein levels; consecutive sections allow the analysis of 
5 multiple samples simultaneously. 

5.5.2 MICROARRAYS 
In preferred embodiments, polynucleotide microarrays are used to measure 
expression so that the expression status of each of the markers above is assessed 

10 simultaneously. In a specific embodiment, the invention provides for oligonucleotide or 
cDNA arrays comprising probes hybridizable to the genes corresponding to each of the 
marker sets described above markers to determine the molecular type or subtype of a 
tumor; markers to distinguish ER status; markers to distinguish BRCA1 from sporadic 
tumors; markers to distinguish patients with good versus patients with poor prognosis; 

15 markers to distinguish both ER(+) from ER(-), and BRCA1 tumors from sporadic tumors; 
markers to distinguish ER(+) from ER(-), and patients with good prognosis from patients 
with poor prognosis; markers to distinguish BRCA1 tumors from sporadic tumors, and 
patients with good prognosis from patients with poor prognosis; and markers able to 
distinguish ER(+) from ER(-), BRCA1 tumors from sporadic tumors, and patients with good 

20 prognosis from patients with poor prognosis; and markers unique to each status). 

The microarrays provided by the present invention may comprise probes 
hybridizable to the genes corresponding to markers able to distinguish the status of one, 
two, or all three of the clinical conditions noted above. In particular, the invention provides 
polynucleotide arrays comprising probes to a subset or subsets of at least 50, 100, 200, 300, 

25 400, 500, 750, 1,000, 1,250, 1,500, 1,750, 2,000 or 2,250 genetic markers, up to the full set 
of 2,460 markers, which distinguish ER(+) and ER(-) patients or tumors. The invention 
also provides probes to subsets of at least 20, 30, 40, 50, 75, 100, 150, 200, 250, 300, 350 or 
400 markers, up to the full set of 430 markers, which distinguish between tumors containing 
a BRCA1 mutation and sporadic tumors within an ER(-) group of tumors. The invention 

30 also provides probes to subsets of at least 20, 30, 40, 50, 75, 100, 150 or 200 markers, up to 
the full set of 231 markers, which distinguish between patients with good and poor 
prognosis within sporadic tumors. In a specific embodiment, the array comprises probes to 
marker sets or subsets directed to any two of the clinical conditions. In a more specific 
embodiment, the array comprises probes to marker sets or subsets directed to all three 

35 clinical conditions. 
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In yet another specific embodiment, microarrays that are used in the methods 
disclosed herein optionally comprise markers additional to at least some of the markers 
listed in Tables 1-6. For example, in a specific embodiment, the microarray is a screening 
or scanning array as described in Altschuler et al. 9 International Publication WO 02/1 8646, 

5 published March 7, 2002 and Scherer et al. 9 International Publication WO 02/16650, 
published February 28, 2002. The scanning and screening arrays comprise regularly- 
spaced, positionally-addressable probes derived from genomic nucleic acid sequence, both 
expressed and unexpressed. Such arrays may comprise probes corresponding to a subset of, 
or all of, the markers listed in Tables 1-6, or a subset thereof as described above, and can be 

1 0 used to monitor marker expression in the same way as a microarray containing only markers 
listed in Tables 1-6. 

In yet another specific embodiment, the microarray is a commercially- 
available cDNA microarray that comprises at least five of the markers listed in Tables 1-6. 
Preferably, a commercially-available cDNA microarray comprises all of the markers listed 

15 in Tables 1-6. However, such a microarray may comprise 5, 10, 15, 25, 50, 100, 150, 250, 
500, 1000 or more of the markers in any of Tables 1-6, up to the maximum number of 
markers in a Table, and may comprise all of the markers in any one of Tables 1-6 and a 
subset of another of Tables 1-6, or subsets of each as described above. In a specific 
embodiment of the microarrays used in the methods disclosed herein, the markers that are 

20 all or a portion of Tables 1-6 make up at least 50%, 60%, 70%, 80%, 90%, 95% or 98% of 
the probes on the microarray. 

General methods pertaining to the construction of microarrays comprising 
the marker sets and/or subsets above are described in the following sections. 

25 5.5.2.1 CONSTRUCTION OF MICROARRAYS 

Microarrays are prepared by selecting probes which comprise a polynucleotide 
sequence, and then immobilizing such probes to a solid support or surface. For example, 
the probes may comprise DNA sequences, RNA sequences, or copolymer sequences of 
DNA and RNA. The polynucleotide sequences of the probes may also comprise DNA 

30 and/or RNA analogues, or combinations thereof. For example, the polynucleotide 
sequences of the probes may be full or partial fragments of genomic DNA. The 
polynucleotide sequences of the probes may also be synthesized nucleotide sequences, such 
as synthetic oligonucleotide sequences. The probe sequences can be synthesized either 
enzymatically in vivo, enzymatically in vitro (e.g., by PCR), or non-enzymatically in vitro. 

35 
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The probe or probes used in the methods of the invention are preferably 
immobilized to a solid support which may be either porous or non-porous. For example, the 
probes of the invention may be polynucleotide sequences which are attached to a 
nitrocellulose or nylon membrane or filter covalently at either the 3* or the 5' end of the 
5 polynucleotide. Such hybridization probes are well known in the art (see, e.g., Sambrook et 
aL, Molecular Cloning - A Laboratory Manual (2nd Ed.), Vols. 1-3, Cold Spring 
Harbor Laboratory, Cold Spring Harbor, New York (1989). Alternatively, the solid support 
or surf ace may be a glass or plastic surface. In a particularly preferred embodiment, 
hybridization levels are measured to microarrays of probes consisting of a solid phase on the 
1 0 surface of which are immobilized a population of polynucleotides, such as a population of 
DNA or DNA mimics, or, alternatively, a population of RNA or RNA mimics. The solid 
phase may be a nonporous or, optionally, a porous material such as a gel. 

In preferred embodiments, a microarray comprises a support or surface with an 
ordered array of binding (e.g., hybridization) sites or "probes" each representing one of the 
15 markers described herein. Preferably the microaixays are addressable arrays, and more 
preferably positionally addressable arrays. More specifically, each probe of the array is 
preferably located at a known, predetermined position on the solid support such that the 
identity (i.e., the sequence) of each probe can be determined from its position in the array 
(i.e., on the support or surface). In preferred embodiments, each probe is covalently 
20 attached to the solid support at a single site. 

Microarrays can be made in a number of ways, of which several are described 
below. However produced, microarrays share certain characteristics. The arrays are 
reproducible, allowing multiple copies of a given array to be produced and easily compared 
with each other. Preferably, microarrays are made from materials that are stable under 
25 binding (e.g., nucleic acid hybridization) conditions- The microarrays are preferably small, 
e.g., between 1 cm 2 and 25 cm 2 , between 12 cm 2 and 13 cm 2 , or 3 cm 2 . However, larger 
arrays are also contemplated and maybe preferable, e.g., for use in screening arrays. 
Preferably, a given binding site or unique set of binding sites in the microarray will 
specifically bind (e.g., hybridize) to the product of a single gene in a cell (e.g., to a specific 
30 mRNA, or to a specific cDNA derived therefrom). However, in general, other related or 
similar sequences will cross hybridize to a given binding site. 

The microarrays of the present invention include one or more test probes, each of 
which has a polynucleotide sequence that is complementary to a subsequence of RNA or 
DNA to be detected Preferably, the position of each probe on the solid surface is known. 
35 Indeed, the microarrays are preferably positionally addressable arrays. Specifically, each 

-109- 



BNSOOCIO: <WO Q210332C*2_L> 



WO 02/103320 



PCT/US02/18947 



probe of the array is preferably located at a known, predetermined position on the solid 
support such that the identity (i.e., the sequence) of each probe can be determined from its 
position on the array (i.e., on the support or surface). 

According to the invention, the microarray is an array (i.e., a matrix) in which each 

5 position represents one of the markers described herein. For example, each position can 
contain a DNA or DNA analogue based on genomic DNA to which a particular RNA or 
cDNA transcribed from that genetic marker can specifically hybridize. The DNA or DNA 
analogue can be, e.g., a synthetic oligomer or a gene fragment. In one embodiment, probes 
representing each of the markers is present on the array. In a preferred embodiment, the 

10 array comprises the 550 of the 2,460 RE-status markers, 70 of the ARC4i/sporadic markers, 
and all 231 of the prognosis markers. 

5.5.2.2 PREPARING PROBES FOR MICROARRAYS 
As noted above, the 4 *probe" to which a particular polynucleotide molecule 

15 specifically hybridizes according to the invention contains a complementary genomic 
polynucleotide sequence. The probes of the microarray preferably consist of nucleotide 
sequences of no more than 1,000 nucleotides. In some embodiments, the probes of the array 
consist of nucleotide sequences of 10 to 1,000 nucleotides, hi a preferred embodiment, the 
nucleotide sequences of the probes are in the range of 10-200 nucleotides in length and are 

20 genomic sequences of a species of organism, such that a plurality of different probes is 
present, with sequences complementary and thus capable of hybridizing to the genome of 
such a species of organism, sequentially tiled across all or a portion of such genome. In 
other specific embodiments, the probes are in the range of 10-30 nucleotides in length, in 
the range of 10-40 nucleotides in length, in the range of 20-50 nucleotides in length, in the 

25 range of 40-80 nucleotides in length, in the range of 50-150 nucleotides in length, in the 
range of 80-120 nucleotides in length, and most preferably are 60 nucleotides in length. 

The probes may comprise DNA or DNA ''mimics" (e.g., derivatives and analogues) 
corresponding to a portion of an organism's genome. In another embodiment, the probes of 
the microarray are complementary RNA or RNA mimics. DNA mimics are polymers 

30 composed of subunits capable of specific, Watson-Crick-like hybridization with DNA, or of 
specific hybridization with RNA. The nucleic acids can be modified at the base moiety, at 
the sugar moiety, or at the phosphate backbone. Exemplary DNA mimics include, e.g., 
phosphorothioates. 

DNA can be obtained, e.g., by polymerase chain reaction (PCR) amplification of 
35 genomic DNA or cloned sequences. PCR primers are preferably chosen based on a known 
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sequence of the genome that will result in amplification of specific fragments of genomic 
DNA. Computer programs that are well known in the art are useful in the design of primers 
with the required specificity and optimal amplification properties, such as Oligo version 5.0 
(National Biosciences). Typically each probe on the microarray will be between 10 bases 

5 and 50,000 bases, usually between 300 bases and 1,000 bases in length. PCR methods are 
well known in the art, and are described, for example, in Innis et al, eds., PCRPROTOCOLS: 
A Guide to Methods and Applications, Academic Press Inc., San Diego, CA (1990). It 
will be apparent to one skilled in the art that controlled robotic systems are useful for 
isolating and amplifying nucleic acids. 

1 0 An alternative, preferred means for generating the polynucleotide probes of the 

microarray is by synthesis of synthetic polynucleotides or oligonucleotides, e.g., using N- 
phosphonate or phosphoramidite chemistries (Froehler et al., Nucleic Acid Res. 14:5399- 
5407 (1986); McBride etal, Tetrahedron Lett. 24:246-248 (1983)). Synthetic sequences 
are typically between about 10 and about 500 bases in length, more typically between about 

15 20 and about 1 00 bases, and most preferably between about 40 and about 70 bases in length. 
In some embodiments, synthetic nucleic acids include non-natural bases, such as, but by no 
means limited to, inosine. As noted above, nucleic acid analogues maybe used as binding 
sites for hybridization. An example of a suitable nucleic acid analogue is peptide nucleic 
acid (see, e.g., Egholm et al, Nature 363:566-568 (1993); U.S. Patent No. 5,539,083). 

20 Probes are preferably selected using an algorithm that takes into account binding energies, 
base composition, sequence complexity, cross-hybridization binding energies, and 
secondary structure (see Friend et a/., International Patent Publication WO 01/05935, 
published January 25, 2001; Hughes et aL, Nat. Biotech. 19:342-7 (2001)). 

A skilled artisan will also appreciate that positive control probes, e.g. 9 probes known 

25 to be complementary and hybridizable to sequences in the target polynucleotide molecules, 
and negative control probes, e.g. 9 probes known to not be complementary and hybridizable 
to sequences in the target polynucleotide molecules, should be included on the array. In one 
embodiment, positive controls are synthesized along the perimeter of the array. In another 
embodiment, positive controls are synthesized in diagonal stripes across the array. In still 

30 another embodiment, the reverse complement for each probe is synthesized next to the 
position of the probe to serve as a negative control. In yet another embodiment, sequences 
from other species of organism are used as negative controls or as "spike-in** controls. 



35 
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5.5.2.3 ATTACHING PROBES TO THE SOLID SURFACE 
The probes are attached to a solid support or surface, which may be made, e.g., from 
glass, plastic (e.g., polypropylene, nylon), polyacrylamide, nitrocellulose, gel, or other 
porous or nonporous material. A preferred method for attaching the nucleic acids to a 

5 surface is by printing on glass plates, as is described generally by Schena et al 9 Science 
270:467-470 (1995). This method is especially useful for preparing microarrays of cDNA 
(See also, DeRisi et al, Nature Genetics 14:457-460 (1996); Shalon et al. 9 Genome Res. 
5:639-645 (1996); and Schena et aL, Proa Natl Acad. Set U.S.A. 93:10539-1 1286 (1995)). 
A second preferred method for making microarrays is by making higfr-density 

10 oligonucleotide arrays. Techniques are known for producing arrays containing thousands of 
oligonucleotides complementary to defined sequences, at defined locations on a surface 
using photolithographic techniques for synthesis in situ (see, Fodor et al. 9 1991, Science 
251:767-773; Pease est al 9 1994, Proc. Natl. Acad. Sci. U.S.A. 91:5022-5026; Lockhart et 
al 9 1996, Nature Biotechnology 14:1675; U.S. Patent Nos. 5,578,832; 5,556,752; and 

15 5,5 1 0,270) or other methods for rapid synthesis and deposition of defined oligonucleotides 
(Blanchard et al 9 Biosensors & Bioelectronics 1 1 :687-690). When these methods are used, 
oligonucleotides (e.g. 9 60-mers) of known sequence are synthesized directly on a surface 
such as a derivatized glass slide. Usually, the array produced is redundant, with several 
oligonucleotide molecules per RNA. 

20 Other methods for making microarrays, e.g. , by masking (Maskos and Southern, 

1992, Nuc. Acids. Res. 20:1679-1684), may also be used. In principle, and as noted supra 9 
any type of array, for example, dot blots on a nylon hybridization membrane (see Sambrook 
et al 9 Molecular Cloning - A Laboratory Manual (2nd Ed.), Vols. 1-3, Cold Spring 
Harbor Laboratory, Cold Spring Harbor, New York (1989)) could be used. However, as 

25 will be recognized by those skilled in the art, very small arrays will frequently be preferred 
because hybridization volumes will be smaller. 

In one embodiment, the arrays of the present invention are prepared by synthesizing 
polynucleotide probes on a support. In such an embodiment, polynucleotide probes are 
attached to the support covalently at either the 3' or the 5' end of the polynucleotide. 

30 In a particularly preferred embodiment, microarrays of the invention are 

manufactured by means of an ink jet printing device for oligonucleotide synthesis, e.g. 9 
using the methods and systems described by Blanchard in U.S. Pat. No. 6,028,1 89; 
Blanchard et al 9 1996, Biosensors and Bioelectronics 11:687-690; Blanchard, 1998, in 
Synthetic DNA Arrays in Genetic Engineering, Vol. 20, J.K. Setlow, Ed., Plenum 

35 Press, New York at pages 1 1 1-123. Specifically, the oligonucleotide probes in such 
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microarrays are preferably synthesized in arrays, e.g., on a glass slide, by serially depositing 
individual nucleotide bases in "microdroplets" of a high surface tension solvent such as 
propylene carbonate. The microdroplets have small volumes {e.g., 100 pL or less, more 
preferably 50 pL or less) and are separated from each other on the microarray {e.g., by 
5 hydrophobic domains) to form circular surface tension wells which define the locations of 
the array elements {i.e., the different probes). Microarrays manufactured by this ink-jet 
method are typically of high density, preferably having a density of at least about 2,500 
different probes per 1 cm 2 . The polynucleotide probes are attached to the support covalently 
at either the 3' or the 5' end of the polynucleotide. 

10 

5.5.2.4 TARGET POLYNUCLEOTIDE MOLECULES 
The polynucleotide molecules which may be analyzed by the present invention (the 
'target polynucleotide molecules") maybe from any clinically relevant source, but are 
expressed RNA or a nucleic acid derived therefrom {e.g., cDNA or amplified RNA derived 

1 5 from cDNA that incorporates an RNA polymerase promoter), including naturally occurring 
nucleic acid molecules, as well as synthetic nucleic acid molecules. In one embodiment, the 
target polynucleotide molecules comprise RNA, including, but by no means limited to, total 
cellular RNA, poly(A) + messenger RNA (mRNA) or fraction thereof, cytoplasmic mRNA, 
or RNA transcribed from cDNA {i.e., cRNA; see, e.g., Linsley & Schelter, U.S. Patent 

20 Application No. 09/41 1,074, filed October 4, 1999, or U.S. Patent Nos. 5,545,522, 

5,891,636, or 5,716,785). Methods for preparing total and poly(A) + RNA are well known in 
the art, and are described generally, e.g., in Sambrook et al, MOLECULAR CLONING - A 
Laboratory Manual (2nd Ed.), Vols. 1-3, Cold Spring Harbor Laboratory, Cold Spring 
Harbor, New York (1989). In one embodiment, RNA is extracted from cells of the various 

25 types of interest in this invention using guanidinium thiocyanate lysis followed by CsCl 
centrifugation (Chirgwin et al, 1979, Biochemistry 18:5294-5299). In another 
embodiment, total RNA is extracted using a silica gel-based column, commercially 
available examples of which include RNeasy (Qiagen, Valencia, California) and StrataPrep 
(Stratagene, La Jolla, California). In an alternative embodiment, which is preferred for S. 

30 cerevisiae, RNA is extracted from cells using phenol and chloroform, as described in 
Ausubel et al, eds., 1989, CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, Vol HI, Green 
Publishing Associates, Inc., John Wiley & Sons, Inc., New York, at pp. 13.12.1-13.12.5). 
Poly(A) + RNA can be selected, e.g., by selection with oligo-dT cellulose or, alternatively, 
by oligo-dT primed reverse transcription of total cellular RNA. In one embodiment, RNA 

35 can be fragmented by methods known in the art, e.g. , by incubation with ZnCl^ to generate 
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fragments of RNA. In another embodiment, the polynucleotide molecules analyzed by the 
invention comprise cDNA, or PCR products of amplified RNA or cDNA. 

In one embodiment, total RNA, mRNA, or nucleic acids derived therefrom, is 
isolated from a sample taken from a person afflicted with breast cancer. Target 

5 polynucleotide molecules that are poorly expressed in particular cells may be enriched using 
normalization techniques (Bonaldo etal, 1996, Genome Res. 6:791-806). 

As described above, the target polynucleotides are detectably labeled at one or more 
nucleotides. Any method known in the art may be used to detectably label the target 
polynucleotides. Preferably, this labeling incorporates the label uniformly along the length 

1 0 of the RNA, and more preferably, the labeling is carried out at a high degree of efficiency. 
One embodiment for this labeling uses oligo-dT primed reverse transcription to incorporate 
the label; however, conventional methods of this method are biased toward generating 3 ! 
end fragments. Thus, in a preferred embodiment, random primers (e.g., 9-mers) are used in 
reverse transcription to uniformly incorporate labeled nucleotides over the full length of the 

1 5 target polynucleotides. Alternatively, random primers may be used in conjunction with 
PCR methods or T7 promoter-based in vitro transcription methods in order to amplify the 
target polynucleotides. 

In a preferred embodiment, the detectable label is a luminescent label. For example, 
fluorescent labels, bio-luminescent labels, chemi-liuninescent labels, and colorimetric labels 

20 may be used in the present invention. In a highly preferred embodiment, the label is a 
fluorescent label, such as a fluorescein, a phosphor, a rhodamine, or a polymethine dye 
derivative. Examples of commercially available fluorescent labels include, for example, 
fluorescent phosphoramidites such as FluorePrime (Amersham Pharmacia, Piscataway, 
N.J.), Fluoredite (Millipore, Bedford, Mass.), FAM (ABI, Foster City, Calif.), and Cy3 or 

25 Cy5 (Amersham Pharmacia, Piscataway, N. J.). In another embodiment, the detectable label 
is a radiolabeled nucleotide. 

Li a further preferred embodiment, target polynucleotide molecules from a patient 
sample are labeled differentially from target polynucleotide molecules of a standard. The 
standard can comprise target polynucleotide molecules from normal individuals (i.e., those 

30 not afflicted with breast cancer). In a highly preferred embodiment, the standard comprises 
target polynucleotide molecules pooled from samples from normal individuals or tumor 
samples from individuals having sporadic-type breast tumors. In another embodiment, the 
target polynucleotide molecules are derived from the same individual, but are taken at 
different time points, and thus indicate the efficacy of a treatment by a change in expression 

35 of the markers, or lack thereof, during and after the course of treatment (i.e., chemotherapy, 

-114- 



BNSDOCID: <WO 021 0332QA2 J_> 



WO 02/103320 



PCTYUS02/18947 



radiation therapy or cryotherapy), wherein a change in the expression of the markers from a 
poor prognosis pattern to a good prognosis pattern indicates that the treatment is efficacious. 
In this embodiment, different timepoints are differentially labeled. 

5 5.5.2.5 HYBRIDIZATION TO MICROARRAYS 

Nucleic acid hybridization and wash conditions are chosen so that the target 
polynucleotide molecules specifically bind or specifically hybridize to the complementary 
polynucleotide sequences of the array, preferably to a specific array site, wherein its 
complementary DNA is located. 

1 0 Arrays containing double-stranded probe DNA situated thereon are preferably 

subjected to denaturing conditions to render the DNA single-stranded prior to contacting 
with the target polynucleotide molecules. Arrays containing single-stranded probe DNA 
(e.g., synthetic oligodeoxyribonucleic acids) may need to be denatured prior to contacting 
with the target polynucleotide molecules, e.g, to remove hairpins or dimers which form due 

15 to self complementary sequences. 

Optimal hybridization conditions will depend on the length (e.g., oligomer versus 
polynucleotide greater than 200 bases) and type (e.g., RNA, or DNA) of probe and target 
nucleic acids. One of skill in the art will appreciate that as the oligonucleotides become 
shorter, it may become necessary to adjust their length to achieve a relatively uniform 

20 melting temperature for satisfactory hybridization results. General parameters for specific 
(i.e., stringent) hybridization conditions for nucleic acids are described in Sambrook et at, 
Molecular Cloning - A Laboratory Manual (2nd Ed.), Vols. 1-3, Cold Spring 
Harbor Laboratory, Cold Spring Harbor, New York (1989), and in Ausubel et ah, CURRENT 
Protocols in Molecular Biology, vol. 2, Current Protocols Publishing, New York 

25 (1994). Typical hybridization conditions for the cDNA microairays of Schena et al are 
hybridization in 5 X SSC plus 0.2% SDS at 65 °C for four hours, followed by washes at 25 
°C in low stringency wash buffer (1 X SSC plus 0.2% SDS), followed by 10 minutes at 25 
°C in higher stringency wash buffer (0.1 X SSC plus 0.2% SDS) (Schena et ah, Proc. Natl. 
Acad. Sci. U.S.A. 93:10614 (1993)). Useful hybridization conditions are also provided in, 

30 e.g., Tijessen, 1993, HYBRIDIZATION WITH NUCLEIC ACID PROBES, Elsevier Science 
Publishers B.V.; and Kricka, 1992, NONISOTOPIC DNA PROBE TECHNIQUES, Academic 
Press, San Diego, CA. 

Particularly preferred hybridization conditions include hybridization at a temperature 
at or near the mean melting temperature of the probes (e.g., within 5 °C, more preferably 
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within 2 °C) in 1 M NaCl, 50 mM MES buffer (pH 6.5), 0.5% sodium sarcosine and 30% 
formamide. 

5.5.2.6 SIGNAL DETECTION AND DATA ANALYSIS 

5 When fluorescently labeled probes are used, the fluorescence emissions at each site 

of a microarray may be, preferably, detected by scanning confocal laser microscopy. In one 
embodiment, a separate scan, using the appropriate excitation line, is carried out for each of 
the two fluorophores used. Alternatively, a laser may be used that allows simultaneous 
specimen illumination at wavelengths specific to the two fluorophores and emissions from 

1 0 the two fluorophores can be analyzed simultaneously (see Shalon et al , 1 996, "A DNA 
microarray system for analyzing complex DNA samples using two-color fluorescent probe 
hybridization," Genome Research 6:639-645, which is incorporated by reference in its 
entirety for all purposes). In a preferred embodiment, the arrays are scanned with a laser 
fluorescent scanner with a computer controlled X-Y stage and a microscope objective. 

15 Sequential excitation of the two fluorophores is achieved with a multi-line, mixed gas laser 
and the emitted light is split by wavelength and detected with two photomultiplier tubes. 
Fluorescence laser scanning devices are described in Schena et al. 9 Genome Res. 6:639-645 
(1996), and in other references cited herein. Alternatively, the fiber-optic bundle described 
by Ferguson et al, Nature Biotech. 14:1681-1684 (1996), may be used to monitor mRNA 

20 abundance levels at a large number of sites simultaneously. 

Signals are recorded and, in a preferred embodiment, analyzed by computer, e.g., 
using a 12 or 16 bit analog to digital board. In one embodiment the scanned image is 
despeckled using a graphics program (e.g., Hijaak Graphics Suite) and then analyzed using 
an image gridding program that creates a spreadsheet of the average hybridization at each 

25 wavelength at each site. If necessary, an experimentally determined correction for "cross 
talk" (or overlap) between the channels for the two fluors may be made. For any particular 
hybridization site on the transcript array, a ratio of the emission of the two fluorophores can 
be calculated. The ratio is independent of the absolute expression level of the cognate gene, 
but is useful for genes whose expression is significantly modulated in association with the 

30 different breast cancer-related condition. 

5.6 COMPUTER-FACILITATED ANALYSIS 
The present invention further provides for kits comprising the marker sets 
above. In a preferred embodiment, the kit contains a microarray ready for hybridization to 
35 target polynucleotide molecules, plus software for the data analyses described above. 
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The analytic methods described in the previous sections can be implemented 
by use of the following computer systems and according to the following programs and 
methods. A Computer system comprises internal components linked to external 
components. The internal components of a typical computer system include a processor 

5 element interconnected with a main memory. For example, the computer system can be an 
Intel 8086-, 80386-, 80486-, Pentium™, or Pentium™-based processor with preferably 32 
MB or more of main memory. 

The external components may include mass storage. This mass storage can 
be one or more hard disks (which are typically packaged together with the processor and 

10 memory). Such hard disks are preferably of 1 GB or greater storage capacity. Other 

external components include a user interface device, which can be a monitor, together with 
an inputting device, which can be a "mouse", or other graphic input devices, and/or a 
keyboard. A printing device can also be attached to the computer. 

Typically, a computer system is also linked to network link, which can be 

1 5 part of an Ethernet link to other local computer systems, remote computer systems, or wide 
area communication networks, such as the Internet. This network link allows the computer 
system to share data and processing tasks with other computer systems. 

Loaded into memory during operation of this system are several software 
components, which are both standard in the art and special to the instant invention. These 

20 software components collectively cause the computer system to function according to the 
methods of this invention. These software components are typically stored on the mass 
storage device. A software component comprises the operating system, which is 
responsible for managing computer system and its network interconnections. This 
operating system can be, for example, of the Microsoft Windows® family, such as 

25 Windows 3.1, Windows 95, Windows 98, Windows 2000, or Windows NT. The software 
component represents common languages and functions conveniently present on this system 
to assist programs implementing the methods specific to this invention. Many high or low 
level computer languages can be used to program the analytic methods of this invention. 
Instructions can be interpreted during run-time or compiled Preferred languages include CI 

30 C++, FORTRAN and JAVA. Most preferably, the methods of this invention are 

programmed in mathematical software packages that allow symbolic entry of equations and 
high-level specification of processing, including some or all of the algorithms to be used, 
thereby freeing a user of the need to procedurally program individual equations or 
algorithms. Such packages include Mathlab from Mathworks (Natick, MA), Mathematica® 

35 from Wolfram Research (Champaign, IL), or S-Plus® from Math Soft (Cambridge, MA). 
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Specifically, the software component includes the analytic methods of the invention as 
programmed in a procedural language or symbolic package. 

The software to be included with the kit comprises the data analysis methods 
of the invention as disclosed herein. In particular, the software may include mathematical 

5 routines for marker discovery, including the calculation of correlation coefficients between 
clinical categories (i.e. 9 ER status) and marker expression. The software may also include 
mathematical routines for calculating the correlation between sample marker expression and 
control marker expression, using array-generated fluorescence data, to determine the clinical 
classification of a sample. 

10 Man exemplary implementation, to practice the methods of the present 

invention, a user first loads experimental data into the computer system. These data can be 
directly entered by the user from a monitor, keyboard, or from other computer systems 
linked by a network connection, or on removable storage media such as a CD-ROM, floppy 
disk (not illustrated), tape drive (not illustrated), ZIP® drive (not illustrated) or through the 

15 network. Next the user causes execution of expression profile analysis software which 
performs the methods of the present invention. 

In another exemplary implementation, a user first loads experimental data 
and/or databases into the computer system. This data is loaded into the memory from the 
storage media or from a remote computer, preferably from a dynamic geneset database 

20 system, through the network. Next the user causes execution of software that performs the 
steps of the present invention. 

Alternative computer systems and software for implementing the analytic 
methods of this invention will be apparent to one of skill in the art and are intended to be 
comprehended within the accompanying claims. In particular, the accompanying claims are 

25 intended to include the alternative program structures for implementing the methods of this 
invention that will be readily apparent to one of skill in the art. 

6. EXAMPLES 

Materials And Methods 

30 117 tumor samples from breast cancer patients were collected. RNA 

samples were then prepared, and each RNA sample was profiled using inkjet-printed 
microarrays. Marker genes were then identified based on expression patterns; these genes 
were then used to train classifiers, which used these marker genes to classify tumors into 
diagnostic and prognostic categories. Finally, these marker genes were used to predict the 

35 diagnostic and prognostic outcome for a group of individuals. . 
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1. Sample collection 

117 breast cancer patients treated at The Netherlands Cancer Institute / 
Antoni van Leeuwenhoek Hospital, Amsterdam, The Netherlands, were selected on the 
basis of the following clinical criteria (data extracted from the medical records of the 

5 NKI/AvL Tumor Register, Biometrics Department). 

Group 1 (n=97, 78 for training, 19 for independent tests) was selected on the 
basis of: (1) primary invasive breast carcinoma <5 cm (Tl or T2); (2) no axillary 
metastases (NO); (3) age at diagnosis <55 years; (4) calender year of diagnosis 1983-1996; 
and (5) no prior malignancies (excluding carcinoma in situ of the cervix or basal cell 

10 carcinoma of the skin). All patients were treated by modified radical mastectomy (n=34) or 
breast conserving treatment (n=64), including axillary lymph node dissection. Breast 
conserving treatment consisted of excision of the tumor, followed by radiation of the whole 
breast to a dosis of 50 Gy, followed by a boost varying from 15 to 25 Gy. Five patients 
received adjuvant systemic therapy consisting of chemotherapy (n=3) or hormonal therapy 

15 (n=2), all other patients did not receive additional treatment. All patients were followed at 
least annually for a period of at least 5 years. Patient follow-up information was extracted 
from the Tumor Registry of the Biometrics Department. 

Group 2 (n=20) was selected as: (1) carriers of a germline mutation in 
BRCA1 or BRCA2; and (2) having primary invasive breast carcinoma. No selection or 

20 exclusion was made based on tumor size, lymph node status, age at diagnosis, calender year 
of diagnosis, other malignancies. Germline mutation status was known prior to this 
research protocol. 

Information about individual from which tumor samples were collected 
include: year of birth; sex; whether the individual is pre- or post-menopausal; the year of 

25 diagnosis; the number of positive lymph nodes and the total number of nodes; whether there 
was surgery, and if so, whether the surgery was breast-conserving or radical; whether there 
was radiotherapy, chemotherapy or hormonal therapy. The tumor was graded according to 
the formula P=TNM, where T is the tumor size (on a scale of 0-5); N is the number of 
nodes that are positive (on a scale of 0-4); and M is metastases (0 = absent, 1 = present). 

30 The tumor was also classified according to stage, tumor type (in situ or invasive; lobular or 
ductal; grade) and the presence or absence of the estrogen and progesterone receptors. The 
progression of the cancer was described by (where applicable): distant metastases; year of 
distant metastases, year of death, year of last follow-up; and BRCA1 genotype. 

35 
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2. Tumors : 

Germline mutation testing of BRCA1 and BRCA2 on DNA isolated from 
peripheral blood lymphocytes includes mutation screening by a Protein Truncation Test 
(PTT) of exon 1 1 of BRCA1 and exon 10 and 1 1 of BRCA2, deletion PCR of BRCA1 
5 genomic deletion of exon 13 and 22, as well Denaturing Gradient Gel Electrophoresis 
(DGGE) of the remaining exons. Aberrant bands were all confirmed by genomic 
sequencing analyzed on a ABB700 automatic sequencer and confirmed on a independent 
DNA sample. 

From all, tumor material was snap frozen in liquid nitrogen within one hour after surgery. 

1 0 Of the frozen tumor material an H&E (hematoxylin-eosin) stained section was prepared 
prior to and after cutting slides for RNA isolation. These H&E frozen sections were 
assessed for the percentage of tumor cells; only samples with >50% tumor cells were 
selected for further study. 

For all tumors, surgical specimens fixed in formaldehyde and embedded in 

1 5 paraffin were evaluated according to standard histopathological procedures. H&E stained 
paraffin sections were examined to assess tumor type (e.g., ductal or lobular according to 
the WHO classification); to assess histologic grade according the method described by 
Elston and Ellis (grade 1-3); and to assess the presence of lymphangio-invasive growth and 
the presence of an extensive lymphocytic infiltrate. All histologic factors were 

20 independently assessed by two pathologists (MV and JL); consensus on differences was 
reached by examining the slides together. A representative slide of each tumor was used for 
immunohistochemical staining with antibodies directed against the estrogen- and 
progesterone receptor by standard procedures. The staining result was scored as the 
percentage of positively staining nuclei (0%, 10%, 20%, etc., up to 100%). 

25 

3. Amplification, labeling, and hybridization 

The outline for the production of marker-derived nucleic acids and 
hybridization of the nucleic acids to a microarray are outlined in FIG. 2. 30 frozen sections 
of 30 jiM thickness were used for total RNA isolation of each snap frozen tumor specimen. 

30 Total RNA was isolated with RNAzol™ B (Campro Scientific, Veenendaal, The 

Netherlands) according to the manufacturers protocol, including homogenization of the 
tissue using a Polytron PT-MR2100 (Merck, Amsterdam, The Netherlands) and finally 
dissolved in RNAse-free H 2 0. The quality of the total RNA was assessed by A260/A280 
ratio and had to be between 1.7 and 2.1 as well as visual inspection of the RNA on an 

35 agarose gel which should indicate a stronger 28S rihosomal RNA band compared to the 18S 
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ribosomal RNA band, subsequently, 25\xg of total RNA was DNase treated using the 
Qiagen RNase-free DNase kit and RNeasy spin columns (Qiagen Ihc, GmbH, Germany) 
according to the manufacturers protocol. DNase treated total RNA was dissolved in RNase- 
free H 2 0 to a final concentration of 0.2^tg/^l. 

5 5 jig total RNA was used as input for cRNA synthesis. An oligo-dT primer 

containing a T7 RNA polymerase promoter sequence was used to prime first strand cDNA 
synthesis, and random primers (pdN6) were used to prime second strand cDNA synthesis by 
MMLV reverse transcriptase. This reaction yielded a double-stranded cDNA that contained 
the T7 RNA polymerase (T7RNAP) promoter. The double-stranded cDNA was then 

10 transcribed into cRNA by T7RNAP. 

cRNA was labeled with Cy3 or Cy5 dyes using a two-step process. First, 
allylamine-derivitized nucleotides were enzymatically incorporated into cRNA products. 
For cRNA labeling, a 3:1 mixture of 5-(3-Aminoallyl)uridine 5'-triphosphate (Sigma) and 
UTP was substituted for UTP in the in vitro transcription (IVT) reaction. Allylamine- 

15 derivitized cRNA products were then reacted with N-hydroxy succinimide esters of Cy3 or 
Cy5 (CyDye, Amersham Pharmacia Biotech). 5\xg Cy5-labeled cRNA from one breast - 
cancer patient was mixed with the same amount of Cy3-labeled product from a pool of 
equal amount of cRNA from each individual sporadic patient. 

Microarray hybridizations were done in duplicate with fluor reversals. 

20 Before hybridization, labeled cRNAs were fragmented to an average size of ~50-100nt by 
heating at 60 °C in the presence of 10 mM ZnC12. Fragmented cRNAs were added to 
hybridization buffer containing 1 M NaCl, 0.5% sodium sarcosine and 50 mM MES, pH 
6.5, which stringency was regulated by the addition of fonnamide to a final concentration of 
30%. Hybridizations were carried out in a final volume of 3 mis at 40 °C on a rotating 

25 platform in a hybridization oven (Robbins Scientific) for48h. After hybridization, slides 
were washed and scanned using a confocal laser scanner (Agilent Technologies). 
Fluorescence intensities on scanned images were quantified, normalized and corrected. 

4. Pooling of samples 

30 The reference cRNA pool was formed by pooling equal amount of cRNAs 

from each individual sporadic patient, for a total of 78 tumors. 

5. 25k humati microaTrav 

Surface-bound oligonucleotides were synthesized essentially as proposed by 
35 Blanchard et al, Biosens. Bioelectron. 6(7):687-690 (1996); see also Hughes et al., Nature 
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Biotech. 19(4):342-347 (2000). Hydrophobic glass surfaces (3 inches by 3 inches) 
containing exposed hydroxyl groups were used as substrates for nucleotide synthesis. 
Phosphoramidite monomers were delivered to computer-defined positions on the glass 
surfaces using ink-jet printer heads. Unreacted monomers were then washed away and the 

5 ends of the extended oligonucleotides were deprotected. This cycle of monomer coupling, 
washing and deprotection was repeated for each desired layer of nucleotide synthesis. 
Oligonucleotide sequences to be printed were specified by computer files. 

Microarrays containing approximately 25,000 human gene sequences 
(Hu25K microarrays) were used for this study. Sequences for microarrays were selected 

10 from RefSeq (a collection of non-redundant mRNA sequences, located on the Internet at 
nlm.nih.gov/LocusLink/refseq.html) and Phil Green EST contigs, which is a collection of 
EST contigs assembled by Dr. Phil Green et al at the University of Washington (Ewing and 
Green, Nat. Genet. 25(2):232-4 (2000)), available on the Internet at phrap.org/est__assembly/ 
index.html. Each mRNA or EST contig was represented on Hu25K microarray by a single 

1 5 60mer oligonucleotide essentially as described in Hughes et aL 9 Nature Biotech. 1 9(4):342- 
347 and in International Publication WO 01/06013, published January 25, 2001, and in 
International Publication WO 01/05935, published January 25, 2001, except that the rules 
for oligo screening were modified to remove oligonucleotides with more than 30%C or with 
6 or more contiguous C residues. 

20 

Example 1 : Differentially regulated gene sets and overall expression patterns of breast 
cancer tumors 

Of the approximately 25,000 sequences represented on the microarray, a group of 
approximately 5,000 genes that were significantly regulated across the group of samples 
25 was selected. A gene was determined to be significantly differentially regulated with cancer 
of the breast if it showed more than two-fold of transcript changes as compared to a 
sporadic tumor pool, and if the p-value for differential regulation (Hughes et ah, Cell 
102: 109-126 (2000)) was less than 0.01 either upwards or downwards in at least five out of 
98 tumor samples. 

30 An unsupervised clustering algorithm allowed us to cluster patients based on 

their similarities measured over this set of -5,000 significant genes. The similarity measure 
between two patients x and y is defined as 



35 
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Equation (5) 



In Equation (5), X and y are two patients with components of log ratio ^ and y i9 r= 1,..., 
JV=5,100. Associated with every value X ( is error . The smaller the value , the more 

10 

reliable the measurement ^ . x ^^J~k / S"Z2~ is error-weighted arithmetic mean. 

m ^ / m ^ 

The use of correlation as similarity metric emphasizes the importance of co-regulation in 

1 5 clustering rather than the amplitude of regulations. 

The set of approximately 5,000 genes can be clustered based on their . 
similarities measured over the group of 98 tumor samples. The similarity measure between 
two genes was defined in the same way as in Equation (1) except that now for each gene, 
there are 98 components of log ratio measurements. 

20 The result of such a two-dimensional clustering is displayed in FIG 3 . Two 

distinctive patterns emerge from the clustering. The first pattern consists of a group of 
patients in the lower part of the plot whose regulations are very different from the sporadic 
pool. The other pattern is made of a group of patients in the upper part of the plot whose 
expressions are only moderately regulated in comparison with the sporadic pool. These 

25 dominant patterns suggest that the tumors can be unambiguously divided into two distinct 
types based on this set of -5,000 significant genes. 

To help understand these patterns, they were associated with estrogen- 
receptor (ER), proestrogen receptor (PR), tumor grade, presence of lymphocytic infiltrate, 
and angioinvasion (FIG. 3). The lower group in FIG 3, which features the dominant pattern, 

30 consists of 36 patients. Of the 39 ER-negative patients, 34 patients are clustered together in 
this group. From FIG. 4, it was observed that the expression of estrogen receptor alpha 
gene ESR1 and a large group of co-regulated genes are consistent with this expression 
pattern. 

From FIG. 3 and FIG. 4, it was concluded that gene expression patterns can 
35 be used to classify tumor samples into subgroups of diagnostic interest. Thus, genes co- 
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regulated across 98 tumor samples contain information about the molecular basis of breast 
cancers. The combination of clinical data, and microarray measured gene abundance of 
ESR1 demonstrates that the distinct types are related to, or at least are reported by, the ER 
status. 

5 

Example 2 : Identification of Genetic Markers Distinguishing Estrogen Receptor (+) 
From Estrogen Receptor (-) Patients 

The results described in this Example allow the identification of expression 
marker genes that differentiate two major types of tumor cells: "ER-negative" group and 
10 "ER-positive" group. The differentiation of samples by ER(+) status was accomplished in 
three steps: (1) identification of a set of candidate marker genes that correlate with ER 
level; (2) rank-ordering these candidate genes by strength of correlation; (3) optimization of 
the number of marker genes; and (4) classifying samples based on these marker genes. 



15 l. Selection of candidate discriminating genes 

In the first step, a set of candidate discriminating genes was identified based 
on gene expression data of training samples. Specifically, we calculated the correlation 

coefficients p between the category numbers or ER level and logarithmic expression ratio T 
across all the samples for each individual gene: 



20 



Equation (2) 



25 The histogram of resultant correlation coefficients is shown in FIG. 5 A as a gray line. 

While the amplitude of correlation or anti-correlation is small for the majority of genes, the 
amplitude for some genes is as great as 0.5. Genes whose expression ratios either correlate 
or anti-correlate well with the diagnostic category of interest are used as reporter genes for 
the category. 

Genes having a correlation coefficient larger than 0.3 ("correlated genes' 9 ) or 
less than -0.3 ("anti-correlated genes") were selected as reporter genes. The threshold of 
0.3 was selected based on the correlation distribution for cases where there is no real 
correlation (one can use permutations to determine this distribution). Statistically, this 
distribution width depends upon the number of samples used in the correlation calculation. 
25 The distribution width for control cases (no real correlation) is approximately 
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where n = the number of samples. In our case, n = 98. Therefore, a threshold of 0.3 
roughly corresponds to 3 - G in the distribution ( 3 X 1/01-3). 

2,460 such genes were found to satisfy this criterion. In order to evaluate the 
significance of the correlation coefficient of each gene with the ER level, a bootstrap 

5 technique was used to generate Monte-Carlo data that randomize the association between 
gene expression data of the samples and their categories. The distribution of correlation 
coefficients obtained from one Monte-Carlo trial is shown as a dashed line in FIG 5A. To 
estimate the significance of the 2,460 marker genes as a group, 10,000 Monte-Carlo runs 
were generated. The collection of 10,000 such Monte-Carlo trials forms the null 

1 0 hypothesis. The number of genes that satisfy the same criterion for Monte-Carlo data varies 
from run to run. The frequency distribution from 10,000 Monte-Carlo runs of the number 
of genes having correlation coefficients of >0.3 or <-0.3 is displayed in FIG. 5B. Both the 
mean and maximum value are much smaller than 2,460. Therefore, the significance of this 
gene group as the discriminating gene set between ER(+) and ER(-) samples is estimated to 

15 be greater than 99.99%. 

2. Rank-ordering of candidate d iscrimina ting genes 

In the second step, genes on the candidate list were rank-ordered based on 
the significance of each gene as a discriminating gene. The markers were rank-ordered 
20 either by amplitude of correlation, or by using a metric similar to a Fisher statistic: 

/ VFl 2 ("l -l) + ^!(«2 -l)J/(»i +Jh -l)/(V"i +yn 2 ) Equation (3) 



25 



30 



In Equation (3), is the error-weighted average of log ratio within the ER(-), and (^2} is 
the error-weighted average of log ratio within the ER(+) group. G[ is the variance of log 
ratio within the ER(-) group and T\ is the number of samples that had valid measurements 



of log ratios. <J 2 is the variance of log ratio within the ER(+) group and «2is the number of 

samples that had valid measurements of log ratios. The f-value in Equation (3) represents 
35 the variance-compensated difference between two means. The confidence level of each 
gene in the candidate list was estimated with respect to a null hypothesis derived from the 
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actual data set using a bootstrap technique; that is, many artificial data sets were generated 
by randomizing the association between the clinical data and the gene expression data. 

3. Optimizati on of the nu mber of marker genes 

5 The leave-one-out method was used for cross validation in order to optimize 

the discriminating genes. For a set of marker genes from the rank-ordered candidate list, a 
classifier was trained with 97 samples, and was used to predict the status of the remaining 
sample. The procedure was repeated for each of the samples in the pool, and the number of 
cases where the prediction for the one left out is wrong or correct was counted. 

1 0 The above performance evaluation from leave-one-out cross validation was 

repeated by successively adding more marker genes from the candidate list. The 
performance as a function of the number of marker genes is shown in FIG. 6. The error 
rates for type 1 and type 2 errors varied with the number of marker genes used, but were 
both minimal while the number of the marker genes is around 550. Therefore, we consider 

1 5 this set of 550 genes is considered the optimal set of marker genes that can be used to 
classify breast cancer tumors into "ER-negative" group and "ER-positive" group. FIG. 7 
shows the classification of patients as ER(+) or ER(-) based on this 550 marker set FIG. 8 
shows the correlation of each tumor to the ER-negative template verse the correlation of 
each tumor to the ER-positive template. 

20 . 

4. Classification based on marker genes 

In the third step, a set of classifier parameters was calculated for each type of 
training data set based on either of the above ranking methods. A template for the ER(-) 

2 5 group (^i) was generated using the error-weighted log ratio average of the selected group of 
genes. Similarly, a template for ER(+) group (called z 2 ) was generated using the error- 
weighted log ratio average of the selected group of genes. Two classifier parameters (I[ 

30 and were defined based on either correlation or distance. F{ measures the similarity 

between one sample y and the ER(-) template \ over this selected group of genes. P<l 

— »- — * 
measures the similarity between one sample >>and the ER(+) template ^ over this selected 

35 group of genes. The correlation P i r is defined as: 
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* fallfil Equation (1) 

A "leave-one-ouf ' method was used to cross-validate the classifier built 

5 based on the marker genes. In this method, one sample was reserved for cross validation 
each time the classifier was trained. For the set of 550 optimal marker genes, the classifier 
was trained with 97 of the 98 samples, and the status of the remaining sample was 
predicted. This procedure was performed with each of the 98 patients. The number of 
cases where the prediction was wrong or correct was counted. It was further determined 

1 0 that subsets of as few as -50 of the 2,460 genes are able classify tumors as ER(+) or ER(-) 
nearly as well as using the total set. 

In a small number of cases, there was disagreement between classification by 
the 550 marker set and a clinical classification. In comparing the microarray measured log 
ratio of expression for ESR1 to the clinical binary decision (negative or positive) of ER 

1 5 status for each patient, it was seen that the measured expression is consistent with the 

qualitative category of clinical measurements (mixture of two methods) for the majority of 
tumors. For example, two patients who were clinically diagnosed as ER(+) actually 
exhibited low expression of ESR1 from microarray measurements and were classified as ER 
negative by 550 marker genes. Additionally, 3 patients who were clinically diagnosed as 

20 ER(-) exhibited high expression of ESR1 from microarray measurements and were 
classified as ER(+) by the same 550 marker genes. Statistically, however, microarray 
measured gene expression of ESRl correlates with the dominant pattens better than 
clinically determined ER status. 

25 Example 3 : Identification of Genetic Markers Distinguishing BRCA1 Tumors From 
Sporadic Tumors in Estrogen Receptor (-) Patients 

The BRCA1 mutation is one of the major clinical categories in breast cancer 
tumors. It was determined that of tumors of 38 patients in the ER(-) group, 17 exhibited the 
BRCA1 mutation, while 21 were sporadic tumors. A method was therefore developed that 
30 enabled the differentiation of the 17 BRCA1 mutation tumors from the 21 sporadic tumors 
in the ER(-) group. 

1. Selection of candid^ Higrrimm ating genes 
In the first step, a set of candidate genes was identified based on the gene 
35 expression patterns of these 38 samples. We first calculated the correlation between the 
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BRCA1 -mutation category number and the expression ratio across all 38 samples for each 
individual gene by Equation (2). The distribution of the correlation coefficients is shown as 
a histogram defined by the solid line in FIG. 9A. We observed that, while the majority of 
genes do not correlate with BRCA1 mutation status, a small group of genes correlated at 

5 significant levels. It is likely that genes with larger correlation coefficients would serve as 
reporters for discriminating tumors oiBRCAl mutation carriers from sporadic tumors 
within the ER(-) group. 

In order to evaluate the significance of each correlation coefficient with 
respect to a null hypothesis that such correlation coefficient could be found by chance, a 

10 bootstrap technique was used to generate Monte-Carlo data that randomizes the association 
between gene expression data of the samples and their categories. 10,000 such Monte-Carlo 
runs were generated as a control in order to estimate the significance of the marker genes as 
a group. A threshold of 0.35 in the absolute amplitude of correlation coefficients (either 
correlation or anti-correlation) was applied both to the real data and the Monte-Carlo data. 

15 Following this method, 430 genes were found to satisfy this criterion for the experimental 
data. The p-value of the significance, as measured against the 10,000 Monte-Carlo trials, is 
approximately 0.0048 (FIG. 9B). That is, the probability that this set of 430 genes 
contained useful information about BRCAl-like tumors vs sporadic tumors exceeds 99%. 

20 2. Rank-ordering of candidate discriminating genes 

In the second step, genes on the candidate list were rank-ordered based on 
the significance of each gene as a discriminating gene. Here, we used the absolute amplitude 
of correlation coefficients to rank order the marker genes. 

25 3 Optimization of discriminating genes 

In the third step, a subset of genes from the top of this rank-ordered list was 

used for classification. We defined a BRCA1 group template (called ^i) by using the error- 
weighted log ratio average of the selected group of genes. Similarly, we defined a non- 
30 - 

BRCA1 group template (called ^2) by using the error-weighted log ratio average of the 

selected group of genes. Two classifier parameters (PI and P2) were defined based on 
either correlation or distance. PI measures the similarity between one sample .Vand the 

35 BRCA1 template Z\ over this selected group of genes. P2 measures the similarity between 
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one sample y and the non-BRCAl template \ over tbis selected group of genes. For 

correlation, PI and P2 were defined in the same way as in Equation (4). 

The leave-one-out method was used for cross validation in order to optimize 

5 the discriminating genes as described in Example 2. For a set of marker genes from the 
rank-ordered candidate list, the classifier was trained with 37 samples the remaining one 
was predicted. The procedure was repeated for all the samples in the pool, and the number 
of cases where the prediction for the one left out is wrong or correct was counted. 

To determine the number of markers constituting a viable subset, the above 

10 performance evaluation from leave-one-out cross validation was repeated by cumulatively 
adding more marker genes from the candidate list The performance as a function of the 
number of marker genes is shown in FIG. 10. The error rates for type 1 (false negative) and 
type 2 (false positive) errors (Bendat & Piersol, RANDOM DATA ANALYSIS AND 
Measurement Procedures, 2d ed., Wiley Interscience, p. 89) reached optimal ranges 

15 when the number of the marker genes is approximately 100. Therefore, a set of about 100 
genes is considered to be the optimal set of marker genes that can be used to classify tumors 
in the ER(-) group as either BRCA1 -related tumors or sporadic tumors. 

The classification results using the optimal 100 genes are shown in FIGS. 
1 1 A and 1 IB. As shown in Figure 1 1 A, the co-regulation patterns of the sporadic patients 

20 differ from those of the BRCA1 patients primarily in the amphtude of regulation. Only one 
sporadic tumor was classified into the BRCA1 group. Patients in the sporadic group are not 
necessarily BRCA1 mutation negative; however, it is estimated that only approximately, 5% 
of sporadic tumors are indeed 5i2C4i-mutation carriers. 

25 Example 4: Identification of Genetic Markers Distinguishing Sporadic Tumor Patients 
with >5 Year Versus <5 Year Survival Times 
78 tumors from sporadic breast cancer patients were used to explore 
prognostic predictors from gene expression data. Of the 78 samples in this sporadic breast 
cancer group, 44 samples were known clinically to have had no distant metastases within 5 
30 years since the initial diagnosis ("no distant metastases group") and 34 samples had distant 
metastases within 5 years since the initial diagnosis ("distant metastases group"). A group 
of 231 markers, and optimally a group of 70 markers, was identified that allowed 
differentiation between these two groups. 
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1. Selection of candidate (fiscrimi nating genes 

In the first step, a set of candidate discriminating genes was identified based 
on gene expression data of these 78 samples. The correlation between the prognostic 
category number (distant metastases vs no distant metastases) and the logarithmic 

5 expression ratio across all samples for each individual gene was calculated using Equation 
(2). The distribution of the correlation coefficients is shown as a solid line in FIG. 12A. 
FIG. 12A also shows the result of one Monte-Carlo run as a dashed line. We observe that 
even though the majority of genes do not correlate with the prognostic categories, a small 
group of genes do correlate. It is likely that genes with larger correlation coefficients would 

10 be more useful as reporters for the prognosis of interest — distant metastases group and no 
distant metastases group. 

In order to evaluate the significance of each correlation coefficient with 
respect to a null hypothesis that such correlation coefficient can be found by chance, we 
used a bootstrap technique to generate data from 10,000 Monte-Carlo runs as a control 

15 (FIG. 12B). We then selected genes that either have the correlation coefficient larger than 
0.3 ("correlated genes") or less than -0.3 ("anti-correlated genes"). The same selection 
criterion was applied both to the real data and the Monte-Carlo data. Using this 
comparison, 231 markers from the experimental data were identified that satisfy this 
criterion. The probability of this gene set for discriminating patients between the distant 

20 metastases group and the no distant metastases group being chosen by random fluctuation is 
approximately 0.003. 



2. Rank-ordering of candidate dis criminati ng genes 

hi the second step, genes on the candidate list were rank-ordered based on 
25 the significance of each gene as a discriminating gene. Specifically, a metric similar to a 
'Tisher" statistic, defined in Equation (3), was used for the purpose of rank ordering. The 
confidence level of each gene in the candidate list was estimated with respect to a null 
hypothesis derived from the actual data set using the bootstrap technique. Genes in the 
candidate list can also be ranked by the amplitude of correlation coefficients. 

30 

3. Optimization of disc riminating genes 

In the third step, a subset of 5 genes from the top of this rank-ordered list 
was selected to use as discriminating genes to classify 78 tumors into a "distant metastases 
group" or a "no distant metastases group". The leave-one-out method was used for cross 
35 validation. Specifically, 77 samples defined a classifier based on the set of selected 
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discriminating genes, and these were used to predict the remaining sample. This procedure 
was repeated so that each of the 78 samples was predicted. The number of cases in which 
predictions were correct or incorrect were counted. The performance of the classifier was 
measured by the error rates of type 1 and type 2 for this selected gene set. 

5 We repeated the above performance evaluation procedure, adding 5 more 

marker genes each tiire from the top of the candidate list, until all 231 genes were used. As 
shown in FIG. 13, the number of mis-predictions of type 1 and type 2 errors change 
dramatically with the number of marker genes employed. The combined error rate reached 
a minimum when 70 marker genes from the top of our candidate list never used. Therefore, 

1 0 this set of 70 genes is the optimal, preferred set of marker genes useful for the classification 
of sporadic tumor patients into either the distant metastases or no distant metastases group. 
Fewer or more markers also act as predictors, but are less efficient, either because of higher 
error rates, or the introduction of statistical noise. 

15 4. Reoccurrence probability curves 

The prognostic classification of 78 patients with sporadic breast cancer, 
tumors into two distinct subgroups was predicted based on their expression of the 70 
optimal marker genes (FIGS. 14 and 15). 

To evaluate the prognostic classification of sporadic patients, we predicted 

20 the outcome of each patient by a classifier trained by the remaining 77 patients based on the 
70 optimal marker genes. FIG. 16 plots the distant metastases probability as a function of 
the time since initial diagnosis for the two predicted groups. The difference between these 
two reoccurrence curves is significant. Using the y? test (S-PLUS 2000 Guide to Statistics, 
vol. 2, MathSoft, p. 44), the p-value is estimated to be ~10' 9 . The distant metastases 

25 probability as a function of the time since initial diagnosis was also compared between 

ER(+) and ER(-) individuals (FIG. 17), PR(+) and PR(-) individuals (FIG. 18), and between 
individuals with different tumor grades (FIGS. 19A, 19B). In comparison, the p-values for 
the differences between two prognostic groups based on clinical data are much less 
significant than that based on gene expression data, ranging from 10" 3 to 1. 

30 To parameterize the reoccurrence probability as a function of time since 

initial diagnosis, the curve was fitted to one type of survival model - "normaT: 

P^axex^-* 2 /* 2 ) (4) 

For fixed a = 1 , we found that r = 125months for patients in the no distant metastases group 
35 and r= 36 months for patients in the distant metastases group. Using tumor grades, we 
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found r= 100 months for patients with tumor grades 1 and 2 and r = 60 for patients with 
tumor grade 3, It is accepted clinical practice that tumor grades are the best available 
prognostic predictor. However, the difference between the two prognostic groups classified 
based on 70 marker genes is much more significant than those classified by the best 
available clinical information. 



5. Prognostic prediction for 19 independent sporadic tumors 

To confirm the proposed prognostic classification method and to ensure the 
reproducibility, robustness, and predicting power of the 70 optimal prognostic marker 
genes, we applied the same classifier to 19 independent tumor samples from sporadic breast 
cancer patients, prepared separately at The Netherlands Cancer Institute (NKI). The same 
reference pool was used. 

The classification results of 19 independent sporadic tumors are shown in 
Figure 20. FIG. 20A shows the log ratio of expression regulation of the same 70 optimum 
marker genes. Based on our classifier model, we expected the misclassification of 
19*(6+7)/78 = 3.2 tumors. Consistently, (1+3) = 4 of 19 tumors were misclassified. 



6. Clinical parameters as a group vs. rmcroarrav data — Results of logistic 
regression 

2 ^ in the previous section, the predictive power of each individual clinical 

parameter was compared with that of the expression data. However, it is more meaningful 
to combine all the clinical parameters as a group, and then compare them to the expression 
data. This requires multi-variant modeling; the method chosen was logistic regression. 
Such an approach also demonstrates how much improvement the microarray approach adds 

25 to the results of the clinical data. 

The clinical parameters used for the multi-variant modeling were: (1) tumor 
grade; (2) ER status; (3) presence or absence of the progestogen receptor (PR); (4) tumor 
size; (5) patient age; and (6) presence or absence of angioinvasion. For the microarray data, 
two correlation coefficients were used. One is the correlation to the mean of fee good 

30 prognosis group (CI) and the other is the correlation to the mean of the bad prognosis group 
(C2). When calculating the correlation coefficients for a given patient, this patient is 
excluded from either of the two means. 

The logistic regression optimizes the coefficient of each input parameter to 
best predict the outcome of each patient. One way to judge the predictive power of each 

35 input parameter is by how much deviance (similar to Chi-square in the linear regression, see 
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for example, Hasomer & Lemeshow, APPLIED LOGISTIC REGRESSION, John Wiley & Sons, 
(2000)) the parameter accounts for. The best predictor should account for most of the 
deviance. To fairly assess the predictive power, each parameter was modeled 
independently. The microarray parameters explain most of the deviance, and hence are 

5 powerful predictors. 

The clinical parameters, and the two microarray parameters, were then 
monitored as a group. The total deviance explained by the six clinical parameters was 31.5, 
and total deviance explained by the microarray parameters was 39.4. However, when the 
clinical data was modeled first, and the two microarray parameters added, the final deviance 

10 accounted for is 57.0. 

The logistic regression computes the likelihood that a patient belongs to the 
good or poor prognostic group. FIGS. 21 A and 21B show the sensitivity vs. (1-specificity). 
The plots were generated by varying the threshold on the model predicted likelihood. The 
curve which goes through the top left comer is the best (high sensitivity with high 

1 5 specificity). The microarray outperformed the clinical data by a large margin . For 

example, at a fixed sensitivity of around 80%, the specificity was -80% from the microarray 
data, and -65% from the clinical data for the good prognosis group. For the poor prognosis 
group, the corresponding specificities were -80% and -70%, again at a fixed sensitivity of 
80%. Combining the microarray data with the clinical data further improved the results. 

20 The result can also be displayed as the total error rate as the function of the threshold in 
FIG. 21C. At all possible thresholds, the error rate from the microarray was always smaller 
than that from the clinical data. By adding the microarray data to the clinical data, the error 
rate is further reduced, as one can see in Figure 21C. 

Odds ratio tables can be created from the prediction of the logistic 

25 regression. The probability of a patient being in the good prognosis group is calculated by 
the logistic regression based on different combinations of input parameters (clinical and/or 
microarray). Patients are divided into the following four groups according to the prediction 
and the true outcome: (1) predicted good and truly good, (2) predicted good but truly poor, 
(3) predicted poor but truly good, (4) predicted poor and truly poor. Groups (1) & (4) 

30 represent correct predictions, while groups (2) & (3) represent mis-predictions. The 
division for the prediction is set at probability of 50%, although other thresholds can be 
used. The results are listed in Table 7. It is clear from Table 7 that microarray profiling 
(Table 7.3 & 7.10) outperforms any single clinical data (Table 7.4-7.9) and the combination 
of the clinical data (Table 7.2). Adding the micro-array profiling in addition to the clinical 

35 data give the best results (Table 7.1). 
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For micro array profiling, one can also make a similar table (Table 7.1 1) 
without using logistic regression. In this case, the prediction was simply based on C1-C2 
(greater than 0 means good prognosis, less than 0 mean bad prognosis). 
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Table 7.1 Prediction by clinical+microarray 




Predicted good 


Predicted poor 


true good 


39 


5 


true poor 


4 


30 




Table 7.2 Prediction by dim 


cal alone 




Predicted good 


Predicted poor 


true good 


34 


10 


true poor 


12 


22 




Table 7.3 Prediction by microarray 




predicted good 


Predicted poor 


true good 


39 


5 


true poor 


10 


24 




Table 7.4 Prediction by grade 




Predicted good 


Predicted poor 


true good 


23 


21 


true poor 


5 


29 




Table 7.5 Prediction by ER 




Predicted good 


Predicted poor 


true good 


35 


9 


true poor 


21 


13 




Table 7.6 Prediction by PR 




Predicted good 


Predicted poor 


true good 


35 


9 


true poor 


18 


16 




Table 7.7 Prediction by size 




Predicted good 


Predicted poor 


true good 


35 


9 


true poor 


13 


21 




Table 7.8 Prediction by age 
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true good 


33 


11 


true poor 


15 


19 




Table 7.9 Prediction by angioinvasion 






true good 


37 


7 


true poor 


19 


15 




Table 7.10 Prediction by dC (C1-C2) 








I 36 


8 




6 


28 
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Table 7.11 


No logistic regression, simply 


iudged by C1-C2 








true good 


37 


7 | 


true poor 


6 


28 



5 

Example 5. Concept of mini-array for diagnosis purposes. 

All genes on the marker gene list for the purpose of diagnosis and prognosis 
can be synthesized on a small-scale microarray using ink-jet technology. A microarray with 
genes for diagnosis and prognosis can respectively or collectively be made. Each gene on 
1 0 the list is represented by single or multiple oligonucleotide probes, depending on its 

sequence uniqueness across the genome. This custom designed mini-array, in combination 
with sample preparation protocol, can be used as a diagnostic/prognostic kit in clinics. 

Example 6. Biological Significance of diagnostic marker genes 

1 5 The public domain was searched for the available functional annotations for 

the 430 marker genes for BRCA1 diagnosis in Table 3. The 430 diagnostic genes in Table 3 
can be divided into two groups: (1) 196 genes whose expressions are highly expressed in 
BRCAl-like group; and (2) 234 genes whose expression are highly expressed sporadic 
group. Of the 196 BRCA1 group genes, 94 are annotated. Of the 234 sporadic group genes, 

20 100 are annotated. The terms "T-cell", " B-celT or "immunoglobulin" are involved in 13 of 
the 94 annotated genes, and in 1 of the 100 annotated genes, respectively. Of 24,479 genes 
represented on the microarrays, there are 7,586 genes with annotations to date. "T-cell", B- 
celT and "immunoglobulin" are found in 207 of these 7,586 genes. Given this, the p-value 
of the 13 "T-cell", "B-cell" or "immunoglobulin" genes in the BRCA1 group is very 

25 significant (p-value = 1.1x10-6). In comparison, the observation of 1 gene relating to "T- 
cell", "B-celT, or "immunoglobulin" in the sporadic group is not significant (p-value = 
0.18). 

The observation that BRCA1 patients have highly expressed lymphocyte (T- 
cell and B-cell) genes agrees with what has been seen from pathology that BRCA1 breast 
30 tumor has more frequently associated with high lymphocytic infiltration than sporadic cases 
(Chappuis et aU 2000, Semin Surg Oncol 18:287-295). 

Example 7. Biological significance of prognosis marker genes 

A search was performed for available functional annotations for the 23 1 
35 prognosis marker genes (Table 5). The markers fall into two groups: (1) 156 markers 
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whose expressions are highly expressed in poor prognostic group; and (2) 75 genes whose 
expression are highly expressed in good prognostic group. Of the 156 markers, 72 genes 
are annotated; of the 75 genes, 28 genes are annotated. 

Twelve of the 72 markers, but none of the 28 markers, are, or are associated 

5 with, kinases. In contrast, of the 7,586 genes on the microarray having annotations to date, 
only 471 involve kinases. On this basis, the p-value that twelve kinase-related markers in 
the poor prognostic group is significant (p-value = 0.001). Kinases are important regulators 
of intracellular signal transduction pathways mediating cell proliferation, differentiation and 
apoptosis. Their activity is normally tightly controlled and regulated. Overexpression of 

1 0 certain kinases is well known involving in oncogenesis, such as vascular endothelial growth 
factor receptorl (VEGFR1 or FLT1), a tyrosine kinase in the poor prognosis group, which 
plays a very important role in tumor angiogenesis. Interestingly, vascular endothelial 
growth factor (VEGF), VEGFR's ligand, is also found in the prognosis group, which means 
both ligand and receptor are upregulated in poor prognostic individuals by an unknown 

15 mechanism. 

Likewise, 16 of the 72 markers, and only two of the 28 markers, are, or are 
associated with, ATP-binding or GTP-binding proteins. In contrast, of the 7,586 genes on 
the microarray having annotations to date, only 714 and 153 involve ATP-binding and GTP- 
binding, respectively. On this basis, the p-value that 16 GTP- or ATP-binding-related 

20 markers in the poor prognosis group is significant (p-value 0.001 and 0.0038). Thus, the 
kinase- and ATP- or GTP-binding-related markers within the 72 markers can be used as 
prognostic indicators. 

Cancer is characterized by deregulated cell proliferation. On the simplest 
level, this requires division of the cell or mitosis. By keyword searching, we found "cell 

25 division" or "mitosis" included in the annotations of 7 genes respectively in the 72 

annotated markers from the 156 poor prognosis markers, but in none for the 28 annotated 
genes from 75 good prognosis markers. Of the 7,586 microarray markers with annotations, 
"cell division" is found in 62 annotations and "mitosis" is found in 37 annotations. Based 
on these findings, the p-value that seven cell division- or mitosis-related markers are found 

30 in the poor prognosis group is estimated to be highly significant (p-value = 3. 5x1 0" 5 ). In 
comparison, the absence of cell division- or mitosis-related markers in the good prognosis 
group is not significant (p-value = 0.69). Thus, the seven cell division- or mitosis-related 
markers may be used as markers for poor prognosis. 

35 
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Example 8 : Construction of an artificial reference pool. 

* The reference pool for expression profiling in the above Examples was made 
by using equal amount of cRNAs from each individual patient in the sporadic group, hi 
order to have a reliable, easy-to-made, and large amount of reference pool, a reference pool 
5 for breast cancer diagnosis and prognosis can be constructed using synthetic nucleic acid 
representing, or derived from, each marker gene. Expression of marker genes for individual 
patient sample is monitored only against the reference pool, not a pool derived from other 
patients. 

To make the reference pool, 60-mer oligonucleotides are synthesized 
1 0 according to 60-mer ink-jet array probe sequence for each diagnostic/prognostic reporter 
genes, then double-stranded and cloned into pBluescript SK- vector (Stratagene, La Jolla, 
CA), adjacent to the T7 promoter sequence. Individual clones are isolated, and the 
sequences of their inserts are verified by DNA sequencing. To generate synthetic RNAs, 
clones are linearized with EcdRI and a T7 in vitro transcription (TVT) reaction is performed 
1 5 according to the MegaScript kit (Ambion, Austin, TX). IVT is followed by DNase 
treatment of the product. Synthetic RNAs are purified on RNeasy columns (Qiagen, 
Valencia, CA). These synthetic RNAs are transcribed, amplified, labeled, and mixed 
together to make the reference pool. The abundance of those synthetic RNAs are adjusted 
to approximate the abundance of the corresponding marker-derived transcripts in the real 
20 tumor pool. 

Example 9: Use of single-channel data and a sample pol represented by stored values. 

1. Creation of a reference pool of stored values T'mathem atinal sample pool"! 
The use of ratio-based data used in Examples 1-7, above, requires a physical 

25 reference sample. In the above Examples, a pool of sporadic tumor sample was used as the 
reference. Use of such a reference, while enabling robust prognostic and diagnostic 
predictions, can be problematic because the pool is typically a limited resource. A classifier 
method was therefore developed that does not require a physical sample pool, making 
application of this predictive and diagnostic technique much simpler in clinical applications. 

30 To test whether single-channel data could be used, the following procedure 

was developed. First, the single channel intensity data for the 70 optimal genes, described 
in Example 4, from the 78 sporadic training samples, described in the Materials and 
Methods, was selected from the sporadic sample vs. tumor pool hybridization data The 78 
samples consisted of 44 samples from patients having a good prognosis and 34 samples 

3 5 from patients having a poor prognosis. Next, the hybridization intensities for these samples 
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were normalized by dividing by the median intensity of all the biological spots on the same 
microarray. Where multiple microarrays per sample were used, the average was taken 
across all of the microarrays. A log transform was performed on the intensity data for each 
of the 70 genes, or for the average intensity for each of the 70 genes where more than one 

5 microarray is hybridized, and a mean log intensity for each gene across the 78 sporadic 
samples was calculated. For each sample, the mean log intensities thus calculated were 
subtracted from the individual sample log intensity. This figure, the mean subtracted 
log(intensity) was then treated as the two color log(ratio) for the classifier by substitution 
into Equation (5). For new samples, the mean log intensity is subtracted in the same 

10 manner as noted above, and a mean subtracted log(intensity) calculated. 

The creation of a set of mean log intensities for each gene hybridized creates 
a "mathematical sample pool" that replaces the quantity-limited "material sample pool." 
This mathematical sample pool can then be applied to any sample, including samples in 
hand and ones to be collected in the ftrture. This '^mathematical sample pool" can be 

15 updated as more samples become available. 

2. Results 

To demonstrate that the mathematical sample pool performs a function 
equivalent to the sample reference pool, the mean-subtracted-log(intensity) (single channel 

20 data, relative to the mathematical pool) vs. the log(ratio) (hybridizations, relative to the 
sample pool) was plotted for the 70 optimal reporter genes across the 78 sporadic samples, 
as shown in FIG. 22. The ratio and single-channel quantities are highly correlated, 
indicating both have the capability to report relative changes in gene expression. A 
classifier was then constructed using the mean-subtracted-log(intensity) following exactly 

25 the same procedure as was followed using the ratio data, as in Example 4. 

As shown in FIGS. 23A and 23B, single-channel data was successful at 
classifying samples based on gene expression patterns. FIG. 23 A shows samples grouped 
according to prognosis using single-channel hybridization data. The white line separates 
samples from patients classified as having poor prognoses (below) and good prognoses 

30 (above). FIG. 23B plots each sample as its expression data correlates with the good (open 
circles) or poor (filled squares) prognosis classifier parameter. Using the "leave-one-out" 
cross validation method, the classifier predicted 10 false positives out of 44 samples from 
patients having a good prognosis, and 6 false negatives out of 34 samples from patients 
having a poor prognosis, where a poor prognosis is considered a "positive." This outcome 

35 
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is comparable to the use of the ratio-based classifier, which predicted 7 out of 44, and 6 out 
of 34, respectively. 

In clinical applications, it is greatly preferable to have few false positives, 
which results in fewer under-treated patients. To conform the results to this preference, a 

5 classifier was constructed by ranking the patient sample according to its coefficient of 
correlation to the "good prognosis" template, and chose a threshold for mis correlation 
coefficient to allow approximately 10% false negatives, i.e., classification of a sample from 
a patient with poor prognosis as one from a patient with a good prognosis. Out of the 34 
poor prognosis samples used herein, this represents a tolerance of 3 out of 34 poor 

1 0 prognosis patients classified incorrectly. This tolerance limit corresponds to a threshold 
0.2727 coefficient of correlation to the "good prognosis" template. Results using this 
threshold are shown in FIGS. 24A and 24B. FIG. 24A shows single-channel hybridization 
data for samples ranked according to the coefficients of correlation with the good prognosis 
classifier; samples classified as "good prognosis" lie above the white line, and those 

1 5 classified as "poor prognosis" fie below. FIG. 24B shows a scatterplot of sample 

correlation coefficients, with three incorrectly classified samples lying to the right of the 
threshold correlation coefficient value. Using this threshold, the classifier had a false 
positive rate of 15 out of the 44 good prognosis samples. This result is not very different 
compared to the error rate of 12 out of 44 for the ratio based classifier. 

20 In summary, the 70 reporter genes carry robust information about prognosis; 

the single channel data can predict the tumor outcome almost as well as the ratio based data, 
while being more convenient in a clinical setting. 
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7. REFERENCES CITED 
All references cited herein are incorporated herein by reference in their 
entirety and for all purposes to the same extent as if each individual publication or patent or 
patent application was specifically and individually indicated to be incorporated by 
5 reference in its entirety for all purposes* 

Many modifications and variations of the present invention can be made 
without departing from its spirit and scope, as will be apparent to those skilled in the art. 
The specific embodiments described herein are offered by way of example only, and the 
invention is to be limited only by the terms of the appended claims along with the full scope 
10 of equivalents to which such claims are entitled. 
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What is claimed is : 

1 . A method for classifying a cell sample as ER(+) or ER(-) comprising 
detecting a difference in the expression by said cell sample of a first plurality of genes 
relative to a control, said first plurality of genes consisting of at least 5 of the genes 

5 corresponding to the markers listed in Table 1 . 

2. The method of claim 1 , wherein said plurality consists of at least 50 
of the genes corresponding to the markers listed in Table 1. 

10 3 . The method of claim 1 , wherein said plurality consists of at least 1 00 

of the genes corresponding to the markers listed in Table 1. 

4. The method of claim 1, wherein said plurality consists of at least 200 
of the genes corresponding to the markers listed in Table 1. 

15 

5. The method of claim 1, wherein said plurality consists of at least 500 
of the genes corresponding to the markers listed in Table 1. 

6. The method of claim 1, wherein said plurality consists of at least 
20 1000 of the genes corresponding to die markers listed in Table 1 . 

7. The method of claim 1 , wherein said plurality consists of each of the 
genes corresponding to the 2,460 markers listed in Table 2. 

25 8. The method of claim 1, wherein said plurality consists of the 550 

gene markers listed in Table 2. 

9. The method of claim 1, wherein said control comprises nucleic acids 
derived from a pool of tumors from individual sporadic patients. 

30 

10. The method of claim 1, wherein said detecting comprises the steps of 
(a) generating an ER(+) template by hybridization of nucleic acids 

derived from a plurality of ER(+) patients within a plurality of sporadic patients against 
nucleic acids derived from a pool of tumors from individual sporadic patients; 

35 
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(b) generating an ER(-) template by hybridization of nucleic acids 
derived from a plurality of ER(-) patients within said plurality of sporadic patients against 
nucleic acids derived from said pool of tumors from individual sporadic patients within said 
plurality; 

5 (c) hybridizing an nucleic acids derived from an individual sample 

against said pool; and 

(d) determining the similarity of marker gene expression in the 
individual sample to the ER(+) template and the ER(~) template, wherein if said expression 
is more similar to the ER(+) template, the sample is classified as ER(+), and if said 
10 expression is more similar to the ER(-) template, the sample is classified as ERQ. 

11. A method for classifying a cell sample as JSiL4C4/-related or 
sporadic, comprising detecting a difference in the expression of a first plurality of genes 
relative to a control, said first plurality of genes consisting of at least 5 of the genes 

1 5 corresponding to the markers listed in Table 3 . 

12. The method of claim 1 1, wherein said plurality consists of at least 50 
of the genes corresponding to the markers listed in Table 3. 

20 13. The method of claim 1 1, wherein said plurality consists of at least 

100 of the genes corresponding to the markers listed in Table 3. 

14. The method of claim 1 1 , wherein said plurality consists of at least 
200 of the genes corresponding to the markers listed in Table 3. 

25 

1 5 . The method of claim 1 1 , wherein said plurality consists of each of the 
genes corresponding to the 430 markers listed in Table 3. 

16. The method of claim 1 1 , wherein said plurality consists of each of the 
30 genes corresponding to the 100 markers listed in Table 4. 

17 . The method of claim 1 1 , wherein said control comprises nucleic 
acids derived from a pool of tumors from individual sporadic patients. 
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18. The method of claim 11, wherein said detecting comprises the steps 

of 

(a) generating a BRCA1 template by hybridization of nucleic acids 
derived from a plurality of BRCA1 patients within a plurality of ER(-) patients against 

5 nucleic acids derived from a pool of tumors; 

(b) generating a sporadic template by hybridization of nucleic acids 
derived from a plurality of sporadic patients within said plurality of ER(-) patients against 
nucleic acids derived from said pool of tumors; 

(c) hybridizing nucleic acids derived from an individual sample against 

10 said pool; and 

(d) detennining the similarity of marker gene expression in the 
individual sample to IhcBRCAJ template and the sporadic template, wherein if said 
expression is more similar to the BRCA1 template, the sample is classified as BRCA1, and if 
said expression is more similar to the sporadic template, the sample is classified as sporadic. 

15 

19. A method for classifying an individual as having a good prognosis 
(no distant metastases within five years of initial diagnosis) or a poor prognosis (distant 
metastases within five years of initial diagnosis), comprising detecting a difference in the 
expression of a first plurality of genes in a cell sample taken from the individual relative to a 

20 control, said first plurality of genes consisting of at least 5 of the genes corresponding to the 
markers fisted in Table 5. 

20. The method of claim 1 9, wherein said plurality consists of at least 20 
of the genes corresponding to the markers listed in Table 5. 

25 

21. The method of claim 19, wherein said plurality consists of at least 
100 of the genes corresponding to the markers fisted in Table 5. 

22. The method of claim 1 9, wherein said plurality consists of at least 
30 150 of the genes corresponding to the markers listed in Table 5. 

23. The method of claim 19, wherein said plurality consists of each of the 
genes corresponding to the 231 markers listed in Table 5. 
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24. The method of claim 1 9, wherein said plurality consists of the 70 
gene markers listed in Table 6. 

25. The method of claim 1, wherein said control comprises nucleic acids 
5 derived from a pool of tumors from individual sporadic patients. 

26. The method of claim 1 9, wherein said detecting comprises the steps 

of: 

(a) generating a good prognosis template by hybridization of nucleic 

10 acids derived from a plurality of good prognosis patients against nucleic acids derived from 
a pool of tumors from individual patients; 

(b) generating a poor prognosis template by hybridization of nucleic 
acids derived from a plurality of poor prognosis patients against nucleic acids derived from 
said pool of tumors from said plurality of individual patients; 

1 5 (c) hybridizing an nucleic acids derived from and individual sample 

against said pool; and 

(d) determining the similarity of marker gene expression in the 

individual sample to the good prognosis template and the poor prognosis template, wherein 

if said expression is more similar to the good prognosis template, the sample is classified as 
20 having a good prognosis, and if said expression is more similar to the poor prognosis 

template, the sample is classified as having a poor prognosis. 

27. The method of claim 1, wherein the cell sample is additionally 
classified as 2?i?C4i-related or sporadic by detecting a difference in the expression of a 

25 second plurality of genes in a cell sample taken from the individual relative to a control, 
said second plurality of genes consisting of at least 5 of the genes corresponding to the 
markers listed in Table 3 or Table 4. 

28. The method of claim 1, wherein the cell sample is additionally 

30 classified as taken from a patient with a good prognosis or a poor prognosis by detecting a 
difference in the expression of a second plurality of genes in a cell sample taken from die 
individual relative to a control, said second plurality of genes consisting of at least 5 of the 
genes corresponding to the markers listed in Table 5. 
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29. The method of claim 1 1 , wherein the cell sample is additionally 
classified as taken from a patient with a good prognosis or a poor prognosis by detecting a 
difference in the expression of a second plurality of genes in a cell sample taken from the 
individual relative to a control, said second plurality of genes consisting of at least 20 of the 

5 genes corresponding to the markers listed in Table 5. 

30. The method of claim 1 1, wherein the cell sample is additionally 
classified as ER(+) or ER(-) by detecting a difference in the expression of a second plurality 
of genes in a cell sample taken from the individual relative to a control, said second 

1 0 plurality of genes consisting of at least 5 of the genes corresponding to the markers listed in 
Table 1. 

3 1 . The method of claim 1 9, wherein the cell sample is additionally 
classified as ER(+) or ER(-) by detecting a difference in the expression of a second plurality 

15 of genes in a cell sample taken from the individual relative to a control, said second 

plurality of genes consisting of at least 5 of the genes corresponding to the markers listed in 
Table 1. 

32. The method of claim 1 9, wherein the cell sample is additionally 
20 classified as BRCA1 or sporadic by detecting a difference in the expression of a second 

plurality of genes in a cell sample taken from the individual relative to a control, said 
second plurality of genes consisting of at least 5 of the genes corresponding to the markers 
listed in Table 3. 

25 33. A method for classifying a sample as ER(+) or ER(-) by calculating 

the similarity between the expression of at least 5 of the markers listed in Table 1 in the 
sample to the expression of the same markers in an ER(-) nucleic acid pool and an ER(+) 
nucleic acid pool, comprising the steps of: 

(a) labeling nucleic acids derived from a sample, with a first fluorophore 
30 to obtain a first pool of fluorophore-labeled nucleic acids; 

(b) labeling with a second fluorophore a first pool of nucleic acids 
derived from two or more ER(+) samples, and a second pool of nucleic acids derived from 
two or more ER(-) samples: 

(c) contacting said first fluorophore-labeled nucleic acid and said first 
35 pool of second fluorophore-labeled nucleic acid with a first microairay under conditions 
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such that hybridization can occur, and contacting said first fluorophore-labeled nucleic acid 
and said second pool of second fluorophore-labeled nucleic acid with a second microarray 
under conditions such that hybridization can occur, wherein said first microarray and said 
second microarray are similar to each other, exact replicas of each other, or are identical, 

5 detecting at each of a plurality of discrete loci on the first microarray a first flourescent 
emission signal from said first fluorophore-labeled nucleic acid and a second fluorescent 
emission signal from said first pool of second fluorophore-labeled genetic matter that is 
bound to said first microarray under said conditions, and detecting at each of the marker loci 
on said second microarray said first fluorescent emission signal from said first fluorophore- 

10 labeled nucleic acid and a third fluorescent emission signal from said second pool of second 
fluorophore-labeled nucleic acid; 

(d) determining the similarity of the sample to the ER(-) and ER(+) pools 
by comparing said first fluorescence emission signals and said second fluorescence 
emission signals, and said first emission signals and said third fluorescence emission 

15 signals; and 

(e) classifying the sample as ER(+) where the first fluorescence emission 
signals are more similar to said second fluorescence emission signals than to said third 
fluorescent emission signals, and classifying the sample as ER(-) where the first 
fluorescence emission signals are more similar to said third fluorescence emission signals 

20 than to said second fluorescent emission signals. 

34. The method of claim 33, wherein said similarity is calculated by 
determining a first sum of the differences of expression levels for each marker between said 
first fluorophore-labeled nucleic acid and said first pool of second fluorophore-labeled 

25 nucleic acid, and a second sum of the differences of expression levels for each marker 
between said first fluorophore-labeled nucleic acid and said second pool of second 
fluorophore-labeled nucleic acid, wherein if said first sum is greater than said second sum, 
the sample is classified as ER(-), and if said second sum is greater than said first sum, the 
sample is classified as ER(+). 

30 

35. The method of claim 33, wherein said similarity is calculated by 
computing a first classifier parameter Pj between an ER(+) template and the expression of 
said markers in said sample, and a second classifier parameter P 2 between an ER(-) template 
and the expression of said markers in said sample, wherein said J? x and P 2 are calculated 

35 according to the formula: 

-146- 



BNSDOCID: <WO 021 03320A2_I_> 



WO 02/103320 PCT/US02/18947 



10 



15 



wherein Z\ and \ are ER(+) and ER(-) templates, respectively, and are calculated by 

averaging said second fluorescence emission signal for each of said markers in said first 
pool of second fluorophore-labeled nucleic acid and said third fluorescence emission signal 
for each of said markers in said second pool of second fluorophore-labeled nucleic acid, 

respectively, and wherein y is said first fluorescence emission signal of each of said 
markers in the sample to be classified as ER(+) or ER(-), wherein the expression of the 
markers in the sample is similar to ER(-) if P, < P 2 , and similar to ER(+) if Pj > P 2 . 

36. A method for determining a set of marker genes whose expression is 
associated with a particular phenotype, comprising the steps of: 

(a) selecting phenotype having two or more phenotype categories; 

(b) identifying a plurality of genes wherein the expression of said genes 
is correlated or anticorrelated with one of the phenotype categories, and wherein the 
correlation coefficient for each gene is calculated according to the equation 

p=(c«r)/(j^-|/|), wherein C is a number presenting said phenotype category and T is 

the logarithmic expression ratio across all the samples for each individual gene, wherein if 
the correlation coefficient has an absolute value of 0.3 or greater, said expression of said 
gene is associated with the phenotype category, 

wherein said plurality of genes is a set of marker genes whose expression is 
25 associated with a particular phenotype. 

37. The method of claim 36, wherein said set of marker genes is 

validated by: 

(a) using a statistical method to randomize the association between said 
30 marker genes and said phenotype category, thereby creating a control correlation coefficient 

for each marker gene; 

(b) repeating step (a) one hundred or more times to develop a frequency 
distribution of said control correlation coefficients for each marker gene; 

(c) determining the number of marker genes having a control correlation 
35 coefficient of 0.3 or above, thereby creating a control marker gene set; and 
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(d) comparing the number of control marker genes so identified to the 
number of marker genes, wherein if the p value of the difference between the number of 
marker genes and the number of control genes is less than a threshold, said set of marker 
genes is validated. 

5 

38. The method of claim 36, wherein said set of marker genes is 
optimized by the method comprising: 

(a) rank-ordering the genes by amplitude of correlation or by significance 
of the correlation coefficients to create a rank-ordered list, and 

10 (b) selecting an arbitrary number n of marker genes from the top of the 

rank-ordered list. 

39. The method of claim 38, wherein said set of marker genes is further 
optimized by the method comprising: 

15 (a) calculating an error rate for said arbitrary number n of marker genes; 

(b) increasing by 1 the number of genes selected from the top of the 
rank-ordered list; 

(c) calculating an error rate for said number of genes selected from the 
top of the rank-ordered list; 

20 (d) repeating steps (b) and (c) until said number of genes selected from 

the top of the rank-ordered list includes all genes included in said rank ordered list, and 

(e) identifying said number of genes selected from the top of the rank- 
ordered list for which the error rate is smallest, 

wherein said set of marker genes is optimized when the error rate is the 

25 smallest. 

40. A method for assigning a person to one of a plurality of categories in 
a clinical trial, comprising determining for each said person the level of expression of at 
least five of the prognosis markers listed in Table 6, determining therefrom whether the 

30 person has an expression pattern that correlates with a good prognosis or a poor prognosis, 
and assigning said person to one category in a clinical trial if said person is determined to 
have a good prognosis, and a different category if that person is determined to have a poor 
prognosis. 

35 
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41 . A method of classifying a first cell or organism as having one of at 
least two different phenotypes, said at least two different phenotypes comprising a first 
phenotype and a second phenotype, said method comprising: 

(a) comparing the level of expression of each of a plurality of genes in a 
5 first sample from the first cell or organism to the level of expression of each of said genes, 

respectively, in a pooled sample from a plurality of cells or organisms, said plurality of cells 
or organisms comprising different cells or organisms exhibiting said at least two different 
phenotypes, respectively, to produce a first compared value; 

(b) comparing said first compared value to a second compared value, 

1 0 wherein said second compared value is the product of a method comprising comparing the 
level of expression of each of said genes in a sample from a cell or organism characterized 
as having said first phenotype to the level of expression of each of said genes, respectively, 
in said pooled sample; 

(c) comparing said first compared value to a third compared value, 

1 5 wherein said third compared value is the product of a method comprising comparing the 
level of expression of each of said genes in a sample from a cell or organism characterized 
as having said second phenotype to the level of expression of each of said genes, 
respectively, in said pooled sample, 

(d) optionally carrying out one or more times a step of comparing said 
20 first compared value to one or more additional compared values, respectively, each 

additional compared value being the product of a method comprising comparing the level of 
expression of each of said genes in a sample from a cell or organism characterized as having 
a phenotype different from said first and second phenotypes but included among said at 
least two different phenotypes, to the level of expression of each of said genes, respectively, 
25 in said pooled sample; and 

(e) determining to which of said second, third and, if present, one or 
more additional compared values, said first compared value is most similar; 

wherein said first cell or organism is determined to have the phenotype of the 
cell or organism, used to produce said compared value most similar to said first compared 
30 value. 

42. The method of claim 40, wherein said compared values are each 
ratios of the levels of expression of each of said genes. 
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43. The method of claim 40, wherein each of said levels of expression of 
each of said genes in said pooled sample are normalized prior to any of said comparing 
steps. 

5 44. The method of claim 42 wherein normalizing said levels of 

expression is carried out by dividing each of said levels of expression by the median or 
mean level of expression of each of said genes or dividing by the mean or median level of 
expression of one or more housekeeping genes in said pooled sample. 

10 45. The method of claim 42 wherein said normalized levels of expression 

are subjected to a log transform and said comparing steps comprise subtracting said log 
transform from the log of said levels of expression of each of said genes in said sample 
from said cell or organism. 

1 5 46. The method of claim 40, wherein said at least two different 

phenotypes are different stages of a disease or disorder. 

47. The method of claim 40, wherein said at least two different 
phenotypes are different prognoses of a disease or disorder. 

20 

48. The method of claim 40, wherein said levels of expression of each of 
said genes, respectively, in said pooled sample or said levels of expression of each of said 
genes in a sample from said cell or organism characterized as having said first phenotype, 
said second phenotype, or said phenotype different from said first and second phenotypes, 

25 respectively, are stored on a computer. 

49. A microarray comprising at least 5 markers derived from any one of 
Tables 1-6, wherein at least 50% of the probes on the microarray are present in any one of 
Tables 1-6. 

30 

50. The microarray of claim 48, wherein at least 70% of the probes on 
the microarray are present in any one of Tables 1-6. 

5 1 . The microarray of claim 48, wherein at least 80% of the probes on 
3 5 the microarray are present in any one of Tables 1 -6. 
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52. The microarray of claim 48, wherein at least 90% of the probes on 
the microarray are present in any one of Tables 1-6. 

53. The microarray of claim 48, wherein at least 95% of the probes on 
5 the microarray are present in any one of Tables 1 -6. 

54. The microarray of claim 48, wherein at least 98% of the probes on 
the microarray are present in any one of Tables 1-6. 

10 55. A microarray for distinguishing ER(+) and ER(-) cell samples 

comprising a positionally-addressable array of polynucleotide probes bound to a support, 
said polynucleotide probes comprising a plurality of polynucleotide probes of different 
nucleotide sequences, each of said different nucleotide sequences comprising a sequence 
complementary and hybridizable to a different gene, said plurality consisting of at least 20 

15 of the genes corresponding to the markers listed in Table 1 or Table 2, wherein at least 50% 
of the probes on the microarray are present in Table 1 or Table 2. 

56. A microarray for distinguishing BRCA1 -related and sporadic cell 
samples comprising a positionally-addressable array of polynucleotide probes bound to a 

20 support, said polynucleotide probes comprising a plurality of polynucleotide probes of 
different nucleotide sequences, each of said different nucleotide sequences comprising a 
sequence complementary and hybridizable to a different gene, said plurality consisting of at 
least 20 of the genes corresponding to the markers listed in Table 3 or Table 4, wherein at 
least 50% of the probes on the microarray are present in Table 3 or Table 4. 

25 

57. A microarray for distinguishing cell samples from individuals having 
a good prognosis and cell samples from individuals having a poor prognosis, comprising a 
positionally-addressable array of polynucleotide probes bound to a support, said 
polynucleotide probes comprising a plurality of polynucleotide probes of different 

30 nucleotide sequences, each of said different nucleotide sequences comprising a sequence 
complementary and hybridizable to a different, said plurality consisting of at least 20 of the 
genes corresponding to the markers listed in Table 5 or Table 6, wherein at least 50% of the 
probes on the microarray are present in Table 5 or Table 6. 

35 
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58. A kit for determining whether a sample contains a BRCA1 or 
sporadic mutation, comprising at least one microarray comprising probes to at least 20 of 
the genes corresponding to the markers listed in Table 3, and a computer readable medium 
having recorded thereon one or more programs for detennining the similarity of the level of 

5 nucleic acid derived from the markers listed in Table 3 in a sample to that in a BRCA1 pool 
and a sporadic tumor pool, wherein the one or more programs cause a computer to perform 
a method comprising computing the aggregate differences in expression of each marker 
between the sample and BRCA1 and the aggregate differences in expression of each marker 
between the sample and sporadic pool, or a method comprising determining the correlation 

10 of expression of the markers in the sample to the expression in the BRCA1 and sporadic 
pools, said correlation calculated according to Equation (3). 

59. A kit for determining the ER-status of a sample, comprising at least 
one microarray comprising probes to at least 20 of the genes corresponding to the markers 

1 5 listed in Table 1, and a computer readable medium having recorded thereon one or more 
programs for determining the similarity of the level of nucleic acid derived from the 
markers listed in Table 1 in a sample to that in an ER(-) pool and an ER(+) pool, wherein 
the one or more programs cause a computer to perform a method comprising computing the 
aggregate differences in expression of each marker between the sample and ER(-) pool and 

20 the aggregate differences in expression of each marker between the sample and ER(+) pool, 
or a method comprising detennining the correlation of expression of the markers in the 
sample to the expression in the ER(-) and ER(+) pools, said correlation calculated according 
to Equation (3). 

25 60. A kit for determining whether a sample is derived from a patient 

having a good prognosis or a poor prognosis, comprising at least one microarray comprising 
probes to at least 20 of the genes corresponding to the markers listed in Table 5, and a 
computer readable medium having recorded thereon one or more programs for determining 
the similarity of the level of nucleic acid derived from the markers listed in Table 5 in a 

30 sample to that in a pool of samples derived from individuals having a good prognosis and a 
pool of samples derived from individuals having a good prognosis, wherein the one or more 
programs cause a computer to perform a method comprising computing the aggregate 
differences in expression of each marker between the sample and the good prognosis pool 
and the aggregate differences in expression of each marker between the sample and the poor 

35 prognosis pool, or a method comprising detennining the correlation of expression of the 
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markers in the sample to the expression in the good prognosis and poor prognosis pools, 
said correlation calculated according to Equation (3). 
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